0

There is a dataframe like below

    Seoul   Busan
    Green   Red Green
    a   1   0   1   2
        2   3   4   5
    b   1   6   7   8
        2   9   10  11

When I execute df.sum(axis=0, level=0), it executes row by row, so result is

Seoul   Busan
Green   Red Green
a   3   5   7
b   15  17  19

But when I execute df.apply(print,axis=0),it prints column by column

a  1    0
   2    3
b  1    6
   2    9
Name: (Seoul, Green), dtype: int32
a  1     1
   2     4
b  1     7
   2    10
Name: (Seoul, Red), dtype: int32
a  1     2
   2     5
b  1     8
   2    11
Name: (Busan, Green), dtype: int32

Why does difference happend though same as 'axis=0'? Could you explain to me?

AMC
  • 2,642
  • 7
  • 13
  • 35
H.K
  • 133
  • 1
  • 1
  • 7
  • Can you provide the data in a more convenient format? I would recommend reading https://stackoverflow.com/q/20109391. – AMC Oct 22 '20 at 02:15
  • 1
    It actually is the same thing. `sum(axis=0)` sums along the (different) indexes, `apply(axis=0)` applies along the (different) indexes. That said, I understand why people, me included, get confused sometimes. – Quang Hoang Oct 22 '20 at 02:37
  • Another intuitive/readable approach that works with pandas methods is to pass `axis="columns"` or `axis="rows"`. instead of 0 or 1. Remember you pass an axis, the function will operate *across* that axis, not within it. – Cameron Riddell Oct 22 '20 at 03:12

1 Answers1

0

When you sum along axis=0, you compute sums along each column, but the result is a single DataFrame, containing sums (with rows for each index a level 0).

But apply with axis=0 means actually: Apply the given method to each column in turn (and collect results).

Note that print only prints its arguments. From the point of view of apply it returns None, so:

  • the actual result of apply, for each column, is None,
  • the content of each column printed is a side effect.

To confirm it run xx = df.apply(print,axis=0) and you will see the same printout.

Then run print(xx) and you will see:

Seoul  Green    None
Busan  Red      None
       Green    None
dtype: object

i.e. just the None results from print invocation for each column.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41