I have a set of documents and a set of labels. Right now, I am using train_test_split to split my dataset in a 90:10 ratio. However, I wish to use Kfold cross-validation.
train=[]
with open("/Users/rte/Documents/Documents.txt") as f:
for line in f:
train.append(line.strip().split())
labels=[]
with open("/Users/rte/Documents/Labels.txt") as t:
for line in t:
labels.append(line.strip().split())
X_train, X_test, Y_train, Y_test= train_test_split(train, labels, test_size=0.1, random_state=42)
When I try the method provided in the documentation of scikit learn: I receive an error that says:
kf=KFold(len(train), n_folds=3)
for train_index, test_index in kf:
X_train, X_test = train[train_index],train[test_index]
y_train, y_test = labels[train_index],labels[test_index]
error
X_train, X_test = train[train_index],train[test_index]
TypeError: only integer arrays with one element can be converted to an index
How can I perform a 10 fold cross-validation on my documents and labels?