0

I'm trying to split my data by different labels, like this:

dfa = df_a[((df_a['label'] == 0) | (df_a['label'] == 15) | (df_a['label'] == 16))]

And this works fine for small amounts of numbers. However, I want to do this for many values. for example:

to_train = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,17, 18, 19, 20) # this can change
dfb = [i for i in to_train if df_b['label']==i] # ValueError

This spits outs an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I've read the other questions with this error, but I am already using bitwise operators, they don't address many conditions from what I understand.

How do I split the dataframe based on what's in the tuple/list/etc?

imdevskp
  • 2,103
  • 2
  • 9
  • 23
I M
  • 313
  • 1
  • 9

1 Answers1

1
to_train = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,17, 18, 19, 20)
dfb = dfa[df_a['label'].isin(to_train)]
imdevskp
  • 2,103
  • 2
  • 9
  • 23
  • This works, just needed to change to `dfb = df_b[df_b['label'].isin(to_train)]` for my code specifically. – I M Apr 23 '21 at 17:37