Pandas axes explained

Question

I'm trying to understand the axis parameter in python pandas. I understand that it's analogous to the numpy axis, but the following example still confuses me:

a = pd.DataFrame([[0, 1, 4], [1, 2, 3]])
print a

   0  1  2
0  0  1  4
1  1  2  3

According to this post, axis=0 runs along the rows (fixed column), while axis=1 runs along the columns (fixed row). Running print a.drop(1, axis=1) yields

   0  2
0  0  4
1  1  3

which results in a dropped column, while print a.drop(1, axis=0) drops a row. Why? That seems backwards to me.

My main question is... if `axis=0` is thought of as column-wise, why then does `drop(1,axis=0)` drop a row? — David, Dec 14 '15 at 03:13

Geotob · Accepted Answer · 2015-12-14T03:34:14.607

4

It's slightly confusing, but axis=0 operates on rows, axis=1 operates on columns.

So when you use df.drop(1, axis=1) you are saying drop column number 1.

The other post has df.mean(axis=1), which essentially says calculate the mean on columns, per row.

This is similar to indexing numpy arrays, where the first index specifies the row number (0th dimension), the second index the column number (1st dimension), and so on.

edited Dec 14 '15 at 03:34

answered Dec 14 '15 at 03:14

Geotob

2,847
1
16
26

It seems to me that knowing when to add the *per row* or *per column* is completely arbitrary. – David Dec 14 '15 at 03:46

Pandas axes explained

1 Answers1