Why is .loc not used in 2nd criteria?
df = pd.DataFrame({
'col_1':[0,3,0,7,1,0],
'col_2':[0,3,6,9,2,4],
'col3':list('aaabbb')
})
No, you are wrong, it working in both.
print (df.loc[df['col_1']==0])
col_1 col_2 col3
0 0 0 a
2 0 6 a
5 0 4 b
print (df.loc[(df['col_1']==0) & (df['col_2']>0)])
col_1 col_2 col3
2 0 6 a
5 0 4 b
print (df[df['col_1']==0])
col_1 col_2 col3
0 0 0 a
2 0 6 a
5 0 4 b
print (df[(df['col_1']==0) & (df['col_2']>0)])
col_1 col_2 col3
2 0 6 a
5 0 4 b
Reason for using is if need also filter columns names, e.g. col_1
:
print (df.loc[df['col_1']==0, 'col_2'])
0 0
2 6
5 4
Name: col_2, dtype: int64
print (df.loc[(df['col_1']==0) & (df['col_2']>0), 'col_2'])
2 6
5 4
Name: col_2, dtype: int64
If need filter 2 or more columns use list, e.g for col_1,col3
use:
print (df.loc[df['col_1']==0, ['col_1','col3']])
col_1 col3
0 0 a
2 0 a
5 0 b
print (df.loc[(df['col_1']==0) & (df['col_2']>0), ['col_1','col3']])
col_1 col3
2 0 a
5 0 b
If omit loc
it failed:
df[df['col_1']==0, 'col_1']
df[(df['col_1']==0) & (df['col_2']>0), 'col_1']
TypeError
Also, why can't we use and in the second code, i.e.
df[(df['col_1']==0) and (df['col_2']>0)]
becasue and
is for processing by scalars, in pandas are used &
for bitwise AND
- &
. More info is here.