2

I need one unified rule that can determine if I should use axis=1 or axis=0 for all situations (please let me know if there are other kinds of functions I did not list below just to test the understanding):

df.dropna() df.drop_duplicates() 
df.drop()
df.mean() # and other calculation based functions    
df.apply(foo)    
pd.concat() 
df.insert()

My current tentative conclusion is that: If the verb in the function name is "guided" (addressed, fed data which can be addressed/interateThru) by using index, then axis=0. Otherwise axis=1. for example:

  • concating 2 dfs together top and bottom, rows on rows: the concat action is guided by knowing when the first df's index end and attach the beginning of second df's index to it. Thus axis=0
  • df.mean(): the mean action requires a series of data fed to it to sum and divide. If the series of data is addressed by index. Thus axis=0
  • df.dropna(): although when checking nan of a column, we are feeding the check algorithm with a series of data that uses index to address itself, but since the function's verb is "drop", the drop action is eventually guided by column label. Thus axis=1

[Possible Duplication] the existing stackoverflow question tries to provide individual understanding to individual functions. While this question tries to unify all understandings into one concise philosophical understanding that covers hopefully all functions.

eliu
  • 2,390
  • 1
  • 17
  • 29
  • 1
    If there were "this method should always be used with this value of `axis`" rules, those methods wouldn't take `axis` in the first place. – user2357112 Jul 30 '18 at 23:00
  • @ user2357112 I see, the question is asking when to use 1 when to use 0. I changed the wording. – eliu Jul 30 '18 at 23:19

0 Answers0