I'm trying to find a very good, comprehensive answer to the above question.
Let's take the following dataframe:
df = pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D'])
df['Category'] = pd.Series(['Cat1', 'Cat2', 'Cat1', 'Cat2'])
df:
A B C D Category
0 0.057456 0.462712 -1.245539 0.095947 Cat1
1 0.741688 0.353600 1.130754 -1.692440 Cat2
2 0.638623 -0.109566 0.118509 -0.692661 Cat1
3 -0.905806 0.284818 -0.094696 -1.325617 Cat2
I understand the basic nomenclature of using df.A
or df['A']
to select column A
, and df[['A', 'B']]
to select both columns A
and B
.
And if I want to pass in boolean conditions I can do things like df[df.A > 0]
to return rows where df.A
is greater than 0. And I can include multiple conditions like df[(df.A > 0) & (df.Category == 'Cat1')]
to include multiple conditions.
One thing I am not seeing though, is how I could select a subset of the dataframe columns with the same two boolean conditions used above.
Two approaches I have tried without success:
df['B'].where(df.A > 0)
This returns all of the rows in the index, whereas I'd only like the ones that return True
for said condition.
I've also tried using query
without success because it can't be used on a Series.
So df['A'].query(some condition)
returns an error message.
Ideally the solution I'm looking for has the following conditions:
- Can be used both on single columns or multiple columns. Ie, you can choose both
df['A']
anddf['B']
with it. - Can use multiple boolean conditions if you want to, not just one.
- Only returns the rows that evaluate to
True
for the condition, not all the rows. - Ideally is syntactically simple, but if that's not possible then so be it.
Thank you.