As @Vivek Kumar commented, the split.split()
call in line two of your code returns an iterable (most likely a generator, not a list or something like that). Your non-working example tries to use the return value as if it wasn't.
Let's look with what kind of data your loop consumes:
for train_index, test_index in ...:
...
The for
loop obviously requires an iterable. In addition, the train_index, test_index
"destructures" each item in the iterable into two values, so each item has to be an iterable with exactly two elements. Usually, a tuple would be used for such cases.
So, the result of split.split()
could look something like this:
[
(a1, b1),
(a2, b2),
...
]
Presumably, n_splits=1
means that there will be only one pair train_index, test_index
- at least that's what you seem to claim and need to verify. In that case, the result will be this:
[
(a1, b1),
]
So only one item that is itself a tuple with two items. You now try to destructure that single item using train_index, test_index = ...
, and this fails: the number of items does not match. You need to first extract the tuple.
There are two basic ways to get the tuple:
pair = split.split(...)[0]
pair, = split.split(...)
I would strongly suggest the second variant, because it fails when there is unexpectedly more than one item; the first variant would just silently discard extra items.
Then, you can destructure the tuple:
train_index, test_index = pair
Or, both in one step:
split = StratifiedShuffleSplit(n_splits=1,test_size=0.2,random_state=42)
(train_index, test_index), = split.split(housing, housing["income_cat"])