3

From reading the pandas documentation, and a good question and answer (What does axis in pandas mean?), I had expected axis=0 to always mean with respect to columns. This works for me when I work with sum(), but works the other way around when I use the dropna() call.

When i Have a dataframe like this:

raw_data = {'column1': [42,13, np.nan, np.nan],
        'column2': [4,12, np.nan, np.nan],
        'column3': [25,61, np.nan, np.nan]}

Which looks like this:

   column1  column2  column3
0     42.0      4.0     25.0
1     13.0     12.0     61.0
2      NaN      NaN      NaN
3      NaN      NaN      NaN

I can print the sums for the respective columns, with axis=0. And this:

df = pd.DataFrame(raw_data )
print(df.sum(axis=0))

Gives the output:

column1    55.0
column2    16.0
column3    86.0

When I try to drop values from the dataframe with axis=0, this should again be with respect to columns*. But when I do:

dfclear=df.dropna(axis=0,how='all')
print(dfclear)

I get the output:

column1  column2  column3
0     42.0      4.0     25.0
1     13.0     12.0     61.0

Where I had expected the following (which I get with axis=1):

   column1  column2  column3
0     42.0      4.0     25.0
1     13.0     12.0     61.0
2      NaN      NaN      NaN
3      NaN      NaN      NaN

So it seems to me that axis behaves differently between sum() and dropna()

Is there something I'm missing here?

*https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

Simon
  • 172
  • 1
  • 4
  • 18
  • I never got to an understanding of this. And as I read the answers they don't seem to adress why the axis command behaves differently between the two. Completely possible that I have just overlooked something. – Simon Apr 02 '18 at 10:53

3 Answers3

1

from the docstring:

In [41]: df.dropna?
Signature: df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Docstring:
Return object with labels on given axis omitted where alternately any
or all of the data are missing

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, or tuple/list thereof
    Pass tuple or list to drop on multiple axes
...

if you are not sure what axis is, use the following method:

In [39]: df.dropna(axis='index', how='all')
Out[39]:
   column1  column2  column3
0     42.0      4.0     25.0
1     13.0     12.0     61.0

In [40]: df.dropna(axis='columns', how='all')
Out[40]:
   column1  column2  column3
0     42.0      4.0     25.0
1     13.0     12.0     61.0
2      NaN      NaN      NaN
3      NaN      NaN      NaN
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • In the pandas docs, it says for dropna: axis : {0 or ‘index’, 1 or ‘columns’} and for sum: axis : {index (0), columns (1)} So it should be the same for both. Though in my example they behave opposite of each other, as far as I can see. – Simon Mar 31 '18 at 11:38
  • @Simon, it looks correct to me: `Return object with labels on given axis omitted` – MaxU - stand with Ukraine Mar 31 '18 at 11:41
  • 1
    Okay. But did you see the part in my question with sum? Thats returns results for each column, not for each row, and that's with axis = 0 – Simon Mar 31 '18 at 14:23
0

I think the answer is correct :

print(df)

produces below output:

   column1  column2  column3
0     42.0      4.0     25.0
1     13.0     12.0     61.0
2      NaN      NaN      NaN
3      NaN      NaN      NaN

dfclear=df.dropna(axis=0,how='all')
print(dfclear)

Produces below output:

   column1  column2  column3
0     42.0      4.0     25.0
1     13.0     12.0     61.0

From Pandas Documentation Sample Explaination :

Drop the rows where all of the elements are nan (there is no row to drop, so df stays the same)

Rehan Azher
  • 1,340
  • 1
  • 9
  • 17
0

Mind you, pandas shift also has counter intuitive axis meaning, where 0 means by raw and 1 means by column.

I guess they need to address these and other similar points in their documentation somewhere

eugen
  • 1,249
  • 9
  • 15