I am new to Machine learning and getting started with Titanic problem on Kaggle. I have written a simple algorithm to predict the result on test data.
My question/confusion is, every time, I execute the algorithm with the same dataset and the same steps, the score value changes (last statement in the code). I am not able to understand this behaviour?
Code:
# imports
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
# load data
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
results = pd.read_csv('gender_submission-orig.csv')
# prepare training and test dataset
y = train['Survived']
X = train.drop(['Survived', 'SibSp', 'Ticket', 'Cabin', 'Embarked', 'Name'], axis=1)
test = test.drop(['SibSp', 'Ticket', 'Cabin', 'Embarked', 'Name'], axis=1)
y_test = results['Survived']
X = pd.get_dummies(X)
test = pd.get_dummies(test)
# fill the missing values
age_median = X['Age'].median()
fare_median = X['Fare'].median()
X['Age'] = X['Age'].fillna(age_median)
test['Age'].fillna(age_median, inplace=True)
test['Fare'].fillna(fare_median, inplace=True)
# train the classifier and predict
clf = DecisionTreeClassifier()
clf.fit(X, y)
predict = clf.predict(test)
# This is the score which changes with execution.
print(round(clf.score(test, y_test) * 100, 2))