Using any() and all() on multiIndex DataFrame while slicing via .xs: strange behavior or just me?

Question

I am fairly new to Python and this community so please forgive my amateurish attempts to explain my profound confusion on something that is most likely very obvious. Anyway..

I have a dataframe called "data". It is multiIndexed with 2 levels consisting of: "date" and "farts".

There is a single column called: "integrated_daily_difference".

You can assume "farts" is of type: 'pandas.core.index.Index' and was created via: farts = data.index.levels[1]

Now lets imagine I would like take a slice view of my dataframe at an arbitrary index value in farts: i.e. farts[1]

Me:

data.xs(farts[1], level = 1)

Computer:

                  integrated_daily_difference
        date    
2015-05-21 00:00:00+00:00    0.000000
2015-05-22 00:00:00+00:00    0.000000
2015-05-26 00:00:00+00:00   -0.024497
2015-05-27 00:00:00+00:00   -0.051144
2015-05-28 00:00:00+00:00   -0.079841
2015-05-29 00:00:00+00:00   -0.106666
2015-06-01 00:00:00+00:00   -0.131245
2015-06-02 00:00:00+00:00   -0.157428
2015-06-03 00:00:00+00:00   -0.184057
2015-06-04 00:00:00+00:00   -0.209755
2015-06-05 00:00:00+00:00   -0.234588
2015-06-08 00:00:00+00:00   -0.262365
2015-06-09 00:00:00+00:00   -0.291890
2015-06-10 00:00:00+00:00   -0.320943
2015-06-11 00:00:00+00:00   -0.352627
2015-06-12 00:00:00+00:00   -0.381425
2015-06-15 00:00:00+00:00   -0.404055

Me:

data.xs(farts[1], level = 1) < 0

Computer:

                integrated_daily_difference
         date   
2015-05-21 00:00:00+00:00   False
2015-05-22 00:00:00+00:00   False
2015-05-26 00:00:00+00:00   True
2015-05-27 00:00:00+00:00   True
2015-05-28 00:00:00+00:00   True
2015-05-29 00:00:00+00:00   True
2015-06-01 00:00:00+00:00   True
2015-06-02 00:00:00+00:00   True
2015-06-03 00:00:00+00:00   True
2015-06-04 00:00:00+00:00   True
2015-06-05 00:00:00+00:00   True
2015-06-08 00:00:00+00:00   True
2015-06-09 00:00:00+00:00   True
2015-06-10 00:00:00+00:00   True
2015-06-11 00:00:00+00:00   True
2015-06-12 00:00:00+00:00   True
2015-06-15 00:00:00+00:00   True

I assume this returns whether or not a value exists for any location within my sliced dataframe so the result is True?

Me:

data.xs(farts[1], level = 1).any()

Computer:

integrated_daily_difference    True
dtype: bool

OK, this all kind of makes sense. Now for the weird stuff..

Me:

data.xs(farts[1], level = 1).any() < 0

Computer:

integrated_daily_difference    False
dtype: bool

Huh....?

Me:

data.xs(farts[1], level = 1).any(axis = 0) < 0

Computer:

integrated_daily_difference    False
dtype: bool

Me:

data.xs(farts[1], level = 1).any(axis = 1) < 0

Computer:

       date
2015-05-21 00:00:00+00:00    False
2015-05-22 00:00:00+00:00    False
2015-05-26 00:00:00+00:00    False
2015-05-27 00:00:00+00:00    False
2015-05-28 00:00:00+00:00    False
2015-05-29 00:00:00+00:00    False
2015-06-01 00:00:00+00:00    False
2015-06-02 00:00:00+00:00    False
2015-06-03 00:00:00+00:00    False
2015-06-04 00:00:00+00:00    False
2015-06-05 00:00:00+00:00    False
2015-06-08 00:00:00+00:00    False
2015-06-09 00:00:00+00:00    False
2015-06-10 00:00:00+00:00    False
2015-06-11 00:00:00+00:00    False
2015-06-12 00:00:00+00:00    False
2015-06-15 00:00:00+00:00    False

Me:

data.xs(farts[1], level = 1).any(axis = 1) <= 0

Computer:

        date
2015-05-21 00:00:00+00:00     True
2015-05-22 00:00:00+00:00     True
2015-05-26 00:00:00+00:00    False
2015-05-27 00:00:00+00:00    False
2015-05-28 00:00:00+00:00    False
2015-05-29 00:00:00+00:00    False
2015-06-01 00:00:00+00:00    False
2015-06-02 00:00:00+00:00    False
2015-06-03 00:00:00+00:00    False
2015-06-04 00:00:00+00:00    False
2015-06-05 00:00:00+00:00    False
2015-06-08 00:00:00+00:00    False
2015-06-09 00:00:00+00:00    False
2015-06-10 00:00:00+00:00    False
2015-06-11 00:00:00+00:00    False
2015-06-12 00:00:00+00:00    False
2015-06-15 00:00:00+00:00    False

Me:

data.xs(farts[1], level = 1).any(axis = 0) <= 0

Computer:

integrated_daily_difference    False
dtype: bool

Then my computer started laughing maniacally at me and my head exploded...

But more seriously, what is going on here? My goal was to going to just try to check if all or any values in my single column dataframe meet a condition and return a boolean True or False. I don't seem to be using any() correctly, so I'm seeking help.

Any input is appreciated. Thank you in advance!

score 2 · Answer 1 · answered Jul 09 '16 at 10:38

Consider this simple series:

import numpy as np
np.random.seed(0)
ser = pd.Series(np.random.randint(0, 3, 10))

ser
Out[78]: 
0    0
1    1
2    0
3    1
4    1
5    2
6    0
7    2
8    0
9    0
dtype: int32

Let's say you want to do the comparison ser < 2, it will return a boolean array:

ser < 2
Out[79]: 
0     True
1     True
2     True
3     True
4     True
5    False
6     True
7    False
8     True
9     True
dtype: bool

Now, if you want to check whether any of them is smaller than 2, you need to call any on this array.

(ser < 2).any()
Out[81]: True

This will return True if at least one of the values in ser < 2 array is True. .all() is similar:

(ser < 2).all()
Out[82]: False

Since not all of them are True, it returns False. If you change it to:

(ser < 3).all()
Out[83]: True

Because it checks the (ser < 3) array and all elements in that array are True.

Now let's try ser.any():

ser.any()
Out[84]: True

Here, you are checking if any of the values in the original array is True (If 0 is True, if 1 is True etc). The values in this array are integers, not booleans. They are evaluated as True if they are not equal to 0. So, since you have at least one non-zero in that array, it returns True.

Now, if I check ser.any() < 0 it will return False:

ser.any() < 0
Out[85]: False

It is because this expression evaluates to True < 0:

True < 0
Out[86]: False

It is False because True is not smaller than 0. What you are doing is similar:

data.xs(farts[1], level = 1).any() < 0

It first executes any() on that section, and returns True because that section has non-zero elements. If you actually want to check whether any of them is smaller than 0, you should type:

(data.xs(farts[1], level = 1) < 0).any()

(data.xs(farts[1], level = 1) < 0) will create a boolean array and if any of the elements in that array is True, .any() will return True as well.

Thanks for the response. very helpful :) – Jul 09 '16 at 10:54 — , Jul 09 '16 at 10:54

hashcode55 · Accepted Answer · 2016-07-09T10:47:07.147

1

First let me define what any means according to the documentation of pandas-

Return whether any element is True over requested axis

Now when you write -

data.xs(farts[1], level = 1).any()

It just checks whether any of the values is truthy or not but as there is no condition given it'll just check the numbers, which means 0 will be taken as False and and any other number as True. Now as there are numbers other that 0, it returns True.

Now you check -

data.xs(farts[1], level = 1).any() < 0

But True is 1 and False is 0 when represented as integers so it returns False as the output to data.xs(farts[1], level = 1).any() is True which is 1. So if you'll check

data.xs(farts[1], level = 1).any() == 1

it'll return True.

Now lets see what happens when you do -

data.xs(farts[1], level = 1).any(axis = 1) <= 0

First you have changed the axis, now data.xs(farts[1], level = 1).any(axis = 1) returns just Trues and Falses according to the values (True/1 for values other than 0 and False/0 for values which are 0). Now as the first two values are 0s/False and it satisfies the condition "<= 0" Its gives you the output which you see. Try doing -

data.xs(farts[1], level = 1).any(axis = 1) == 1

and you'll get the just opposite output.

Contrary to any(), all() works differently... It returns true if all are True or all are False else it just returns False.

And just to mention -

any and or OR all and and are not the same if you might think.... or and and are bitwise operations and they follow short circuit evaluation but any and all being functions they'll walk through all the conditions.

Hope it helps :)

edited Jul 09 '16 at 10:47

answered Jul 09 '16 at 10:28

hashcode55

5,622
4
27
40

This is quite helpful. Thank you! I actually went a little deeper and tried the following: mylist = [1,2,3,4,0,-1] print 'test 1:', all(mylist) > -1 print 'test 2:', all(i > -1 for i in mylist) Test 1 printed True and Test 2 printed False. I assume this is due to True(1) and False(0) both being > -1 for the first case, and when evaluated in a generator it behaves differently.. – Jul 09 '16 at 10:48
let be break down your first try, `all(mylist)` again there is no condition, but as the list contains a 0 it'll return False which is numerically 0, that why when you test it with "> -1" it'll return True. – hashcode55 Jul 09 '16 at 10:53
and in the second one, you are testing on a list of booleans! i > -1 will return False for -1 which'll lead to overall False, try >= -1 and you'll get True! Its just the base concept which you have to keep in mind....."Return True if any element of the iterable is true." - for any and "Return True if all elements of the iterable are true " - for all – hashcode55 Jul 09 '16 at 10:55
check it out -http://stackoverflow.com/questions/19389490/how-do-pythons-any-and-all-functions-work – hashcode55 Jul 09 '16 at 11:00
Is there a way to force it to evaluate the actual numerical equivalency of the condition instead of interpreting the integer values as boolean True's or False's? Like in the first Test1 case: instead of -1 > -1 evaluating as True due to -1 being > False? 0 is > -1 but -1 isn't so I'm still not sure why it returns True for all(mylist) > -1 – Jul 09 '16 at 11:08

Using any() and all() on multiIndex DataFrame while slicing via .xs: strange behavior or just me?

2 Answers2