I am using the iris data set from sklearn. I need to split the data, sample the training set without repetition based on the proportions, apply a Naive Bayes Classifier, record score and return a dictionary that maps the sample size (key) used to fit the model to the corresponding score (training and test scores as a tuple)
I need some help with the returning dictionary part. This is what I have done to get the required dictionary. I am unsure if what I have done is correct or if there is a better way to do this.
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.naive_bayes import MultinomialNB
score_list=shape_list=[]
iris = load_iris()
props=[0.2,0.5,0.7,0.9]
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
y=df[list(df.loc[:,df.columns.values =='target'])]
X=df[list(df.loc[:,df.columns.values !='target'])]
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3
,train_size=0.7)
for i in props:
ix = np.random.choice(X_train.index, size=int(i*len(X_train)), replace = False)
sampleX = X_train.loc[ix]
sampleY = y_train.loc[ix]
modelNB = MultinomialNB()
modelNB.fit(sampleX, sampleY)
train_score=modelNB.score(sampleX,sampleY)
test_score=modelNB.score(X_test,y_test)
score_list.append((train_score , test_score))
shape_list.append(sampleX.shape[0])
print(dict(zip(shape_list,score_list)))