I am trying to use sklearn
to train a decision tree based on my dataset.
When I was trying to slicing the data to (outcome:Y, and predicting variables:X), it turns out that the outcome (my label) is in True
/False
:
#data slicing
X = df.values[:,3:27] #X are the sets of predicting variable, dropping unique_id and student name here
Y = df.values[:,'OffTask'] #Y is our predicted value (outcome), it is in the 3rd column
This is how I do, but I do not know whether this is the right approach:
#convert the label "OffTask" to dummy
df1 = pd.get_dummies(df,columns=["OffTask"])
df1
My trouble is the dataset df1 return my label Offtask
to OffTask_N
and OffTask_Y
Can someone know how to fix it?