1

I have a list of ID's and a dataframe, where one of the columns is ID. I want to drop all rows in the dataframe where the ID is not one of the ID's in the list of ID's. This is the code I use:

df = df.drop(df[df.ID not in list_IDs].index)

but I get this error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What am I doing wrong?

pepe
  • 9,799
  • 25
  • 110
  • 188
robinpiksi
  • 13
  • 4

3 Answers3

0

try this:

df.ix[~df.ID.isin(list_IDs)]

Explanation

constructions like df.ID not in list_IDs won't work even in vanilla Python:

In [12]: [1,2,3] in [1,2,3]
Out[12]: False

In [13]: [1,2] in [1,2,3]
Out[13]: False

In pandas you want to use .isin() function

Data:

In [14]: list_IDs
Out[14]: [24, 12, 42, 44]

In [15]: df
Out[15]:
   ID   A
0  58  69
1  36  63
2  92  43
3  24  37
4  12  54
5  42   0
6  44  57
7  78  59
8  59  85
9  56  84

Demo

In [16]: df.ID.isin(list_IDs)
Out[16]:
0    False
1    False
2    False
3     True
4     True
5     True
6     True
7    False
8    False
9    False
Name: ID, dtype: bool

In [17]: df[df.ID.isin(list_IDs)]
Out[17]:
   ID   A
3  24  37
4  12  54
5  42   0
6  44  57

Negative isin()

In [18]: df[~df.ID.isin(list_IDs)]
Out[18]:
   ID   A
0  58  69
1  36  63
2  92  43
7  78  59
8  59  85
9  56  84

In [19]: ~df.ID.isin(list_IDs)
Out[19]:
0     True
1     True
2     True
3    False
4    False
5    False
6    False
7     True
8     True
9     True
Name: ID, dtype: bool
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Could you explain your answer? You're not really answering "*What am I doing wrong?*". [From review](http://stackoverflow.com/review/low-quality-posts/12728267) – Wai Ha Lee Jun 18 '16 at 00:13
0

Check out the answer from unutbu at Evaluating pandas series values with logical expressions and if-statements . Basically, pandas always raises an error if you try to evaluate TRUE/FALSE by comparing the array to a list because it is not clear whether the user expects TRUE to be returned iff all values in the series match or TRUE if more than one value in the series matches. Hence, specific functions such as .any and .all must be used instead.

Addition: Why does array < 5 work then? It's because there is no ambiguity. All the values in the array are compared elementwise to 5. If it was array == [5,6] then it's not clear whether True or False is expected. It is equal to the first element but not the second. In some circumstances, you would want True and in others, you would want False. To get around the ambiguity, users are expected to use specific functions such as .any.

Community
  • 1
  • 1
Yarnspinner
  • 852
  • 5
  • 7
  • Thanks for the answer. Now, I understand why I got the error, but what makes me still a bit confused is why this works: df[df.ID != 5]. This is also a TRUE/FALSE evaluation on series. what is the difference between this case and df[df.ID not in list_IDs], aren't they both operation on a series? @MaxU – robinpiksi Jun 19 '16 at 10:52
  • In pandas, if u use a boolean operator on a dataframe/series and a scalar, the comparison is carried out on each and every element to return a dataframe/series of equal size with [TRUE FALSE TRUE FALSE ...] and so on. There's no ambiguity because its a single value since its a scalar. In list_IDs, the ambiguity mentioned above remains. – Yarnspinner Jun 19 '16 at 11:11
0
import pandas as pd
x = pd.Series([1,2,3])

Now, think about how you expect python to evaluate this

(x in [1,2])

or more directly

pd.Series([1,2,3]) in [1,2]

As you can see

"ValueError: The truth value of a Series is ambiguous"

What you are looking to do is this

x.isin([1,2])
michael_j_ward
  • 4,369
  • 1
  • 24
  • 25