18

I know this question has been asked before, however, when I am trying to do an if statement and I am getting an error. I looked at this link , but did not help much in my case. My dfs is a list of DataFrames.

I am trying the following,

for i in dfs:
    if (i['var1'] < 3.000):
       print(i)

Gives the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

AND I tried the following and getting the same error.

for i,j in enumerate(dfs):
    if (j['var1'] < 3.000):
       print(i)

My var1 data type is float32. I am not using any other logical operators and & or |. In the above link it seemed to be because of using logical operators. Why do I get ValueError?

AbyxDev
  • 1,363
  • 16
  • 30
i.n.n.m
  • 2,936
  • 7
  • 27
  • 51
  • do all DFs in the list have only one row? – MaxU - stand with Ukraine Aug 03 '17 at 20:28
  • 1
    When should `if` be true? From the moment there is at least one such row? Or from the moment all values are less than 3? – Willem Van Onsem Aug 03 '17 at 20:29
  • in this case it's not clear - what are you comparing in the `if ...`? – MaxU - stand with Ukraine Aug 03 '17 at 20:30
  • @WillemVanOnsem `if` should be `true` when `var1` is less than 3. – i.n.n.m Aug 03 '17 at 20:30
  • 1
    @i.n.n.m, do you realize that your comparison is very similar to `[1,2,3,4,5] > 2`? What result do you expect? – MaxU - stand with Ukraine Aug 03 '17 at 20:32
  • @i.n.n.m: but `var1` is a column, so it can contains 1000s of elements. – Willem Van Onsem Aug 03 '17 at 20:32
  • @MaxU I am comparing `var1`. If the value of `var1` column has a value less than 3, then I would like to get the index of that `df` – i.n.n.m Aug 03 '17 at 20:32
  • @i.n.n.m, the value of `var1` is a __Series__ (list alike object or vector) – MaxU - stand with Ukraine Aug 03 '17 at 20:33
  • @MaxU `[1,2,3,4,5] > 2` in this case, I can not compare a list to `2`. Is that something I am trying to do in my code? When I do it for a single `df` it returns `True` or `False` though – i.n.n.m Aug 03 '17 at 20:35
  • 3
    @i.n.n.m `i` is a dataframe, so `i['var1']` is a Series. As @MaxU said, this is equivalent to comparing each element in the series to your value, e.g. `[1 < 3, 2 < 3, 3 < 3, 4 < 3, 5 < 3]`. The result is an identically shaped series with the result of each comparison, `[True, True, False, False, False]` – Alexander Aug 03 '17 at 20:36
  • 2
    @i.n.n.m there is disconnect with the terms you are using and the results we see. We cannot reconcile these differences unless you help us by **showing** us the data, your code, and what you are trying to get. This is the spirit of providing a minimal, complete, and verifiable example or [**MCVE**](http://stackoverflow.com/help/mcve) – piRSquared Aug 03 '17 at 20:37
  • 2
    @i.n.n.m, i'm pretty sure you would find it out yourself when trying to show a desired data set ;-) Please listen to piRSquared - this will save your and our time. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU - stand with Ukraine Aug 03 '17 at 20:40
  • @MaxU yes, i figured it already, thank you for asking the right questions to make me understand. – i.n.n.m Aug 03 '17 at 20:44

3 Answers3

15

Here is a small demo, which shows why this is happenning:

In [131]: df = pd.DataFrame(np.random.randint(0,20,(5,2)), columns=list('AB'))

In [132]: df
Out[132]:
    A   B
0   3  11
1   0  16
2  16   1
3   2  11
4  18  15

In [133]: res = df['A'] > 10

In [134]: res
Out[134]:
0    False
1    False
2     True
3    False
4     True
Name: A, dtype: bool

when we try to check whether such Series is True - Pandas doesn't know what to do:

In [135]: if res:
     ...:     print(df)
     ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
skipped
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Workarounds:

we can decide how to treat Series of boolean values - for example if should return True if all values are True:

In [136]: res.all()
Out[136]: False

or when at least one value is True:

In [137]: res.any()
Out[137]: True

In [138]: if res.any():
     ...:     print(df)
     ...:
    A   B
0   3  11
1   0  16
2  16   1
3   2  11
4  18  15
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
4

Currently, you're selecting the entire series for comparison. To get an individual value from the series, you'll want to use something along the lines of:

for i in dfs:
if (i['var1'].iloc[0] < 3.000):
   print(i)

To compare each of the individual elements you can use series.iteritems (documentation is sparse on this one) like so:

for i in dfs:
    for _, v in i['var1'].iteritems():
        if v < 3.000:
            print(v)

The better solution here for most cases is to select a subset of the dataframe to use for whatever you need, like so:

for i in dfs:
    subset = i[i['var1'] < 3.000]
    # do something with the subset

Performance in pandas is much faster on large dataframes when using series operations instead of iterating over individual values. For more detail, you can check out the pandas documentation on selection.

Gasvom
  • 611
  • 7
  • 4
2

the comparison returns a range of values, you need to limit it either by any() or all(), for example,

     if((df[col] == ' this is any string or list').any()):
       return(df.loc[df[col] == temp].index.values.astype(int)[0])
Shaina Raza
  • 1,474
  • 17
  • 12