1

I have the following data frame:

>>> df = pd.DataFrame([[True, np.nan, False],[True,np.nan,False],[True,np.nan,False]])
>>> df
      0   1      2
0  True NaN  False
1  True NaN  False
2  True NaN  False

According to the docs, doing df.all(axis=1, skipna=True) corresponds to checking if all values are true column-wise, so I expected it to give True,True,False, but it gives False,False,False. It seems that the meaning of axis has been flipped, i.e. axis=0 is for columnwise.

This seems in contradiction with the meaning of axis in DataFrame.dropna, for example,

>>> df.dropna(axis=1)
      0      2
0  True  False
1  True  False
2  True  False

as well as in np.delete.

Was this intentional? And if so, why?

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Garrett
  • 4,007
  • 2
  • 41
  • 59
  • 1
    This looks like a mistake in the docs (could be a bug) there is a related issue: http://stackoverflow.com/questions/25773245/ambiguity-in-pandas-dataframe-axis-definition?rq=1 about the apparent switching of meaning of axis and it is due to numpy, compare what happens when you try `np.all(df, axis=0)` and `np.all(df, axis=1)` the output is the same as pandas – EdChum Oct 10 '14 at 09:19

2 Answers2

1

I think this is a mistake in the docs as this method will call numpy.all and if you compare the outputs they are the same:

In [211]:

np.all(df,axis=0)
Out[211]:
array([True, nan, False], dtype=object)
In [212]:

np.all(df, axis=1)
Out[212]:
array([False, False, False], dtype=object)

Also dropna and np.delete agree on the output:

In [213]:

df.dropna(axis=1)
Out[213]:
      0      2
0  True  False
1  True  False
2  True  False
In [222]:

np.delete(df.values, 1,axis=1)
Out[222]:
array([[True, False],
       [True, False],
       [True, False]], dtype=object)
EdChum
  • 376,765
  • 198
  • 813
  • 562
1

I agree it's not always very intuitive, but I think the behaviour is consistent.

axis=0 works down the r0ws, axis=1 works across the co1umns.

So, df.all(axis=1, skipna=True) returns False, False, False because it performs an action across all columns (here returning the result of all(True, True, False) at each row). Meanwhile, df.all(axis=0, skipna=True) looks down rows (for each column in turn). Only rows in column 2 contain False values, hence the result.

Similarly, dropna(axis=1) and delete(axis=1) initiate actions on columns (i.e. looking at each column for to see whether it should be dropped or deleted). Columns 0 and 2 do not contain any NaN values and so are retained, while column 1 disappears.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238