0

I have a very particular use case where pipeline users are allowed to pass in string expressions that get evaluated by a pipeline via DataFrame.query(). There are obviously far better ways to determine column existence in pandas, however using .query() is my current constraint.

Ideally I'd like to have a query that accepts a single column name and return a dataframe with either 1 column if it exists and no columns if it does not.

Input DataFrame:

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
index a b
0 1 4
1 2 5
2 3 6

Desired return value when looking for a column that exists:

looking_for = "a"
df.query("@looking_for in columns")
index a
0 1
1 2
2 3

Desired return value when looking for a column does not exist:

looking_for = "c"
df.query("@looking_for in columns")
index
0
1
2

What I've tried:

This is easy when using the dataframe directly, here is one way. However, after reading pandas query docs and fiddling around I have yet to find a way to do this from the .query() method.

df.loc[:, df.columns.isin(["c"])]
index
0
1
2
aylr
  • 359
  • 1
  • 9

2 Answers2

1

query only works with filtering operations. If you're constrained to do this by building string expressions only, you can use df.eval (close sister to df.query):

if df.eval("@looking_for in @df.columns.tolist()"):
    print (df.eval("@df[@looking_for]"))

You could also use the top level pd.eval function directly ( pd.eval("df[looking_for]")). More on eval in this post by me.


Without the if check, eval could result in KeyError, so you could alternatively wrap the code inside try-except, this is a bit shorter.

try:
    print (df.eval("@df[@looking_for]"))
except KeyError:
    # column not present
cs95
  • 379,657
  • 97
  • 704
  • 746
1

As commented, I'm not sure why you are insisting on using query, which is not the best in this case. There are several options:

Option 1: `filter:

looking_for = 'c'
df.filter(regex = rf'^{looking_for}$')

Option 2: reindex:

df.reindex([looking_for], axis=1)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • This is the best advice for this question - "don't use query()". However for unnamed reasons OP appears to be aware of better approaches (they're even linked in the question) but declines to use them. Not sure what their actual use case is but I imagine that would be useful context. – cs95 Dec 21 '20 at 19:11