Machine Learning For Beginners

Spine Abnormality Prediction

What is Machine Learning?

Machine learning is an artificial intelligence (AI) technology that allows programs the ability to learn and develop from experience automatically without being programmed directly. The focus of machine learning is on designing computer programs that can navigate and use data and learn for themselves.


Video Guide


What You will learn In This Tutorial

In this tutorial you'll learn to create a spine abnormality prediction program using machine learning in Python. We need a dataset to train our program. You can download it from Kaggle repository here:

Dataset: https://www.kaggle.com/sammy123/lower-back-pain-symptoms-dataset

Note: Please change 'Abnormal' and 'Normal' labels in last column of dataset to '0' and '1' respectively.

About This Dataset

This dataset contains records of more than hundred persons including normal and abnormal spines in a '.csv' file, as shown below:




There are 13 columns out of which first 12 columns are the symptoms. The last column 'Class_att' is the final result of the symptoms (whether the spine is normal or abnormal). Here '0' stands for 'abnormal' and '1' stands for 'normal'.

To train the program, we shall simply pass this data to the program. We shall select an algorithm and this algorithm will be used by the program to train itself. When the program will finish learning, we shall test it. That's it!


Wait... How Would You Test The Program?

The problem is that we are not 'Doctors', how would we know if our program works fine or not? We can not use the same data to test on which we trained it. We want predictions about new data. Do we need another dataset?

No, we shall split this data into two parts. 80% of the data will be used for training and remaining 20% can be used for the testing. Hurray! Problem solved.


Let's Code It.

Open your Python IDE and start coding.


# Pandas for reading csv file
import pandas as pd
        
# For splitting dataset
from sklearn.model_selection import train_test_split
# For testing and viewing scores of our model
from sklearn.metrics import classification_report, confusion_matrix
        
# Algorithm for our model
from sklearn.svm import SVC
Now we read our dataset and split it.
# Reading dataset using Pandas Library.
data=pd.read_csv('dataset_spine.csv')

# Class_att column in our dataset contains final results whether the spine is normal or abnormal
# We shall save the whole column in 'y'
y=data['Class_att']

# All other coloums will be saved in 'x'
# Drop function will drop the Class_att column
x=data.drop('Class_att',axis=1)
       
# train_test_split will split 'x' and 'y' in two parts
# So totally there will be four parts
# Two parts of 'x' (Symptoms). and Two parts of 'y' (Predictions)
# These parts are called test and train parts.
# Here the test_size is 0.2 (20%).
# So 20% of data from dataset will not be used for training the data
# When the model is trained, we shall use that 20% to test it
# Because if we train our model on all the data. How would you test it??
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

Selecting our model.
# Selecting our SVC algorithm
model = SVC(kernel='linear')
# Training model on x_train, y_train
# x_train contains 80% of symptoms
# y_train contains corresponding results of that 80%
model.fit(x_train, y_train)

Training and Testing
# Now trying to predict results from x_test
# x_test is the 20% about which our model doesn't know
y_pred=model.predict(x_test)

# Printing Empty line
print()
# Printing Scores (Confusion matrix and classification report)
print("Scores...")
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
 That's it. Run the code and see what happens. Below is the screen shot of the Output of scores of our model.

Here the 2x2 matrix is the confusion matrix. First row is the 'Actual NO' and first column is the 'Predicted NO' which means:
Actual NO = 40 + 6 = 46
Predicted NO = 40 + 4 = 44
It means only two were predicted wrong

Second row mean 'Actual YES' and second column is 'Predicted YES'
Actual YES = 4 + 12 = 16
Predicted YES = 6 + 12 = 18
Only 2 were predicted wrong

Next is the classification report. You can see that accuracy is 84%. Just ignore everything else for now. Because this is enough for today.

Enjoy!