I have a very particular use case where pipeline users are allowed to pass in string expressions that get evaluated by a pipeline via DataFrame.query()
. There are obviously far better ways to determine column existence in pandas, however using .query()
is my current constraint.
Ideally I'd like to have a query that accepts a single column name and return a dataframe with either 1 column if it exists and no columns if it does not.
Input DataFrame:
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
index | a | b |
---|---|---|
0 | 1 | 4 |
1 | 2 | 5 |
2 | 3 | 6 |
Desired return value when looking for a column that exists:
looking_for = "a"
df.query("@looking_for in columns")
index | a |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
Desired return value when looking for a column does not exist:
looking_for = "c"
df.query("@looking_for in columns")
index |
---|
0 |
1 |
2 |
What I've tried:
This is easy when using the dataframe directly, here is one way. However, after reading pandas query docs and fiddling around I have yet to find a way to do this from the .query()
method.
df.loc[:, df.columns.isin(["c"])]
index |
---|
0 |
1 |
2 |