0

I need to split my data into a training set (80%) and test set (20%). I currently do that with the code below:

StratifiedShuffleSplit(n_splits=10,test_size=.2, train_size=.8, random_state=0)

How ever i need to specify a particular attribute for spliting. I am not able to do it

Mattravel
  • 1,358
  • 1
  • 15
  • I'm not sure my answer is exactly what you are looking for. What do you mean "using an attribute for splitting". Moreover, are you attempting a simple 80/20 split or a K-fold split? – Mattravel Feb 21 '23 at 06:26
  • 1
    Please add more details about your data and how you are trying to split it. RIght now, it's not totally clear what you are asking. – AlexK Feb 21 '23 at 07:42

1 Answers1

-1

If you want to split your data in an 80/20 stratified manner, I recommend using train_test_split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, stratify=y, random_state=0)

If you need to use StratifiedShuffleSplit, you can do the following:

sss = StratifiedShuffleSplit(n_splits=10, test_size=.2, random_state=0)

for train_index, test_index in sss.split(X, y):
     X_train, X_test = X.iloc[train_index], X.iloc[test_index]
     y_train, y_test = y[train_index], y[test_index]

More info here.

Mattravel
  • 1,358
  • 1
  • 15