1

I'm trying to understand the axis parameter in python pandas. I understand that it's analogous to the numpy axis, but the following example still confuses me:

a = pd.DataFrame([[0, 1, 4], [1, 2, 3]])
print a

   0  1  2
0  0  1  4
1  1  2  3

According to this post, axis=0 runs along the rows (fixed column), while axis=1 runs along the columns (fixed row). Running print a.drop(1, axis=1) yields

   0  2
0  0  4
1  1  3

which results in a dropped column, while print a.drop(1, axis=0) drops a row. Why? That seems backwards to me.

Community
  • 1
  • 1
David
  • 1,454
  • 3
  • 16
  • 27
  • 1
    My main question is... if `axis=0` is thought of as column-wise, why then does `drop(1,axis=0)` drop a row? – David Dec 14 '15 at 03:13

1 Answers1

4

It's slightly confusing, but axis=0 operates on rows, axis=1 operates on columns.

So when you use df.drop(1, axis=1) you are saying drop column number 1.

The other post has df.mean(axis=1), which essentially says calculate the mean on columns, per row.

This is similar to indexing numpy arrays, where the first index specifies the row number (0th dimension), the second index the column number (1st dimension), and so on.

Geotob
  • 2,847
  • 1
  • 16
  • 26
  • It seems to me that knowing when to add the *per row* or *per column* is completely arbitrary. – David Dec 14 '15 at 03:46