I'm currently working on a model to predict a probability of fatality once a person is infected with the Corona virus. I'm using a Dutch dataset with categorical variables: date of infection, fatality or cured, gender, age-group etc. It was suggested to use a decision tree, which I've already built. Since I'm new to decision trees I would like some assistance. I would like to have the prediction (target variable) expressed in a probability (%), not in a binary output. How can I achieve this? Also I want to play around with samples by inputting the data myself and see what the outcome is. For instance: let's take someone who is 40, male etc. and calculate what its survival chance is. How can I achieve this? I've attached the code below:
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import random as rnd
filename = '/Users/sef/Downloads/pima-indians-diabetes.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1234)
model = DecisionTreeClassifier()
model.fit(X_train, Y_train)
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')
rnd.seed(123458)
X_new = X[rnd.randrange(X.shape[0])]
X_new = X_new.reshape(1,8)
YHat = model.predict_proba(X_new)
df = pd.DataFrame(X_new, columns = names[:-1])
df["predicted"] = YHat
print(df)