Confusing use of axis while selecting rows in pandas

Question

I have gone through Ambiguity in Pandas Dataframe / Numpy Array "axis" definition but it still does not resolve my confusion about the use of axis in pandas.

Lets say I have a dataframe df containing columns 'col1' and 'col2'. I want to keep rows where 'col1' is 1 and 'col2' is 0. If I run

df[df.apply(lambda x:(x['col1']==1) and (x['col2']==0))]

I get an error.

I have to pass axis=1

df[df.apply(lambda x:(x['col1']==1) and (x['col2']==0), axis=1)]

for it to work. That does not make sense to me. The function passed to apply is being applied to every row, so according to my mental model, axis should be 0.

How do I make sense of axis=1 in this example?

why apply but not `df[df['col1'].eq(1) & df['col2'].eq(0)]`? — Quang Hoang, Jul 16 '20 at 15:28
df.apply(func, axis=0) or df.apply(func)... will apply the function on each column of data,starting with the first column then move to column 2. df.apply(func, axis=1) applies the function to each row of data, one row at a time. Using df.apply(func, axis=1 should be the last resort and should be avoided at all cost. The good thing about pandas is that most operations are done with index alignment, so there shouldn't be a need to use apply(axis=1). — Scott Boston, Jul 16 '20 at 15:35
@QuangHoang Thanks I wasn't aware of this syntax. The questions nonetheless holds for a more complicated function. — elexhobby, Jul 17 '20 at 16:25

Confusing use of axis while selecting rows in pandas

0 Answers0