I need one unified rule that can determine if I should use axis=1 or axis=0 for all situations (please let me know if there are other kinds of functions I did not list below just to test the understanding):
df.dropna() df.drop_duplicates()
df.drop()
df.mean() # and other calculation based functions
df.apply(foo)
pd.concat()
df.insert()
My current tentative conclusion is that: If the verb in the function name is "guided" (addressed, fed data which can be addressed/interateThru) by using index, then axis=0. Otherwise axis=1. for example:
- concating 2 dfs together top and bottom, rows on rows: the concat action is guided by knowing when the first df's index end and attach the beginning of second df's index to it. Thus axis=0
- df.mean(): the mean action requires a series of data fed to it to sum and divide. If the series of data is addressed by index. Thus axis=0
- df.dropna(): although when checking nan of a column, we are feeding the check algorithm with a series of data that uses index to address itself, but since the function's verb is "drop", the drop action is eventually guided by column label. Thus axis=1
[Possible Duplication] the existing stackoverflow question tries to provide individual understanding to individual functions. While this question tries to unify all understandings into one concise philosophical understanding that covers hopefully all functions.