1

I'm totally new to Data Science in general and was hoping someone could explain why this does not work:

I'm using the Advertising dataset from the following url: "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv" which has 3 feature columns ("TV", "Radio", "Newspaper") and 1 label column ("sales"). My complete dataset is named data.

Next, I try to use sklearn's StratifiedShuffleSplit function to divide the data into training and testing sets.

from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, random_state=0) # can use test_size=0.8
for train_index, test_index in split.split(data.drop("sales", axis=1), data["sales"]): # Generate indices to split data into training and test set.
    strat_train_set = data.loc[train_index]
    strat_test_set = data.loc[test_index]

I get this ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

Using the same code on another dataset which has 14 feature columns and 1 label column separates the data appropriately. Why doesn't it work here? Thanks.

Evan C
  • 53
  • 1
  • 7
  • http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html – Aditya Mar 26 '18 at 03:59
  • @Aditya. Aah yes, wrong question link. More proper case and explanation [is here](https://stackoverflow.com/a/48314533/3374996). – Vivek Kumar Mar 26 '18 at 05:52
  • @Aditya. But as I see the target in this question is "Sales", so its a regression problem and hence my [original link](https://stackoverflow.com/a/47548572/3374996) is correct for this case. – Vivek Kumar Mar 26 '18 at 06:20

1 Answers1

1

I think that problem is your data_y is 2D matrix.

but as I see in sklearn.model_selection.StratifiedShuffleSplit doc, it should be the 1D vector. Try to encode each row of data_y as the integer (it will be interpreted as a class), and after use split.

Or possibly your y is a regression variable (continuous numerical data).(Vivek's link)

Aditya
  • 2,380
  • 2
  • 14
  • 39