1

I am getting the error when I make a comparison on a single element in a dataframe, but I don't understand why.

I have a dataframe df with timeseries data for a number of customers, with some null values within it:

df.head()
                    8143511  8145987  8145997  8146001  8146235  8147611  \
2012-07-01 00:00:00      NaN      NaN      NaN      NaN      NaN      NaN   
2012-07-01 00:30:00    0.089      NaN    0.281    0.126    0.190    0.500   
2012-07-01 01:00:00    0.090      NaN    0.323    0.141    0.135    0.453   
2012-07-01 01:30:00    0.061      NaN    0.278    0.097    0.093    0.424   
2012-07-01 02:00:00    0.052      NaN    0.278    0.158    0.170    0.462  

In my script, the line if pd.isnull(df[[customer_ID]].loc[ts]): generates an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

However, if I put a breakpoint on the line of script, and when the script stops I type this into the console:

pd.isnull(df[[customer_ID]].loc[ts])

the output is:

8143511    True
Name: 2012-07-01 00:00:00, dtype: bool

If I allow the script to continue from that point, the error is generated immediately.

If the boolean expression can be evaluated and has the value True, why does it generate an error in the if expression? This makes no sense to me.

doctorer
  • 1,672
  • 5
  • 27
  • 50
  • check answer of this: http://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o – Rohanil Apr 05 '17 at 05:00
  • ok, thank you. So if I use pd.isnull(df_gen[[customer_ID]].loc[ts].item()) then the boolean is evaluated OK, but I don't understand why teh original didn't work. – doctorer Apr 05 '17 at 05:11
  • Because original returns `` object it is not boolean. – Rohanil Apr 05 '17 at 05:13

3 Answers3

4

The problem lies in the if statement.

When you code

if this:
    print(that)

this will be evaluated as bool(this). And that better come back as True or False.

However, you did:

if  pd.isnull(df[[customer_ID]].loc[ts]):
    pass  # idk what you did here because you didn't say... but doesn't matter

Also, you stated that pd.isnull(df[[customer_ID]].loc[ts]) evaluated to:

8143511    True
Name: 2012-07-01 00:00:00, dtype: bool

Does that look like a True or False?
What about bool(pd.isnull(df[[customer_ID]].loc[ts]))?

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So the lesson is: A pd.Series cannot be evaluated as True or False

It is, however, a pd.Series of Trues and Falses.

And that is why it doesn't work.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Actually, the lesson I learnt is `df[[customer_ID]].loc[ts]` returns a `pd.Series` not a single value – doctorer Apr 06 '17 at 09:18
2

Problem is you need compare scalar for return scalar (True, False), but there is one item Series, which is converted to one item boolean Series.

Solutions is converting to scalar using Series.item or values with selecting first value by [0]:

customer_ID = '8143511'
ts = '2012-07-01 00:00:00'

print (df[[customer_ID]].loc[ts].item())
nan

if pd.isnull(df[[customer_ID]].loc[ts]).item():
    print ('super')
print (df[[customer_ID]].loc[ts].values[0])
nan

if pd.isnull(df[[customer_ID]].loc[ts]).values[0]:
    print ('super')

But if use DataFrame.loc, get scalar (if not duplicated index or columns names):

print (df.loc[ts, customer_ID])
nan

customer_ID = '8143511'
ts = '2012-07-01 00:00:00'
if pd.isnull(df.loc[ts, customer_ID]):
    print ('super')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

The second set of [] was returning a series which I mistook for a single value. The simplest solution is to remove []:

if pd.isnull(df[customer_ID].loc[ts]):
       pass
doctorer
  • 1,672
  • 5
  • 27
  • 50