0

does anyone know what is the problem?

x=np.linspace(-3,3,100)
rng=np.random.RandomState(42)
y=np.sin(4*x)+x+rng.uniform(size=len(x))
X=x[:,np.newaxis]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.25,random_state=42,stratify=y) 

I have this error:

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Reza
  • 35
  • 4

3 Answers3

1

Try removing stratify=y, you should do without. Also, have a peek here.

Yaxit
  • 167
  • 2
  • 11
  • 1
    Please notice that, in such cases, we flag the question as a duplicate instead of answering it. – desertnaut Mar 23 '21 at 17:36
  • I could not find the option to do so, I don't know if it's still locked for my low reputation or I just could not find it... – Yaxit Mar 23 '21 at 17:52
1

From the documentation:

3.1.2.2. Cross-validation iterators with stratification based on class labels.

Some classification problems can exhibit a large imbalance in the distribution of the target classes: for instance there could be several times more negative samples than positive samples. In such cases it is recommended to use stratified sampling as implemented in StratifiedKFold and StratifiedShuffleSplit to ensure that relative class frequencies is approximately preserved in each train and validation fold.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Carlos Melus
  • 1,472
  • 2
  • 7
  • 12
1

The parameter (stratify = y) inside the train_test_split is giving you the error. Stratify is used when your labels have repeating values. Eg: Let's say your label columns have values of 0 and 1. Then passing stratify = y, would preserve the original proportion of your labels in your training samples. Say, if you had 60% of 1s and 40% of 0s, then your training sample will also have the same proportion.