Pandas: query string where column name contains special characters

Question

I am working with a data frame that has a structure something like the following:

In[75]: df.head(2)
Out[75]: 
  statusdata             participant_id association  latency response  \
0   complete  CLIENT-TEST-1476362617727       seeya      715  dislike   
1   complete  CLIENT-TEST-1476362617727      welome      800     like   

   stimuli elementdata statusmetadata demo$gender  demo$question2  \
0  Sample B    semi_imp       complete        male              23   
1  Sample C    semi_imp       complete      female              23

I want to be able to run a query string against the column demo$gender.

I.e,

df.query("demo$gender=='male'")

But this has a problem with the $ sign. If I replace the $ sign with another delimited (like -) then the problem persists. Can I fix up my query string to avoid this problem. I would prefer not to rename the columns as these correspond tightly with other parts of my application.

I really want to stick with a query string as it is supplied by another component of our tech stack and creating a parser would be a heavy lift for what seems like a simple problem.

Thanks in advance.

score 15 · Accepted Answer · edited Apr 05 '22 at 14:55

15

With the most recent version of pandas, you can esscape a column's name that contains special characters with a backtick (`)

df.query("`demo$gender` == 'male'")

Another possibility is clean the columns names as a previous step in your process, replacing special characters by some other more appropriate.

For instance:

(df
 .rename(columns = lambda value: value.replace('$', '_'))
 .query("demo_gender == 'male'")
)

edited Apr 05 '22 at 14:55

Tommy

12,588
14
59
110

answered May 07 '21 at 05:18

dvillaj

780
1
9
8

1

this answer should be way higher now - presumably pandas added the backtick syntax later. This is the correct answer in 2022; I confirmed that backtick works for escaping columns with special chars – Tommy Apr 05 '22 at 12:15
Thanks. I have changed the accepted answer! – Joe Apr 05 '22 at 13:57

Joe · Answer 2 · 2019-09-13T08:20:18.113

For the interested here is a simple proceedure I used to accomplish the task:

# Identify invalid column names
invalid_column_names = [x for x in list(df.columns.values) if not x.isidentifier() ]

# Make replacements in the query and keep track
# NOTE: This method fails if the frame has columns called REPL_0 etc.
replacements = dict()
for cn in invalid_column_names:
    r = 'REPL_'+ str(invalid_column_names.index(cn))
    query = query.replace(cn, r)
    replacements[cn] = r

inv_replacements = {replacements[k] : k for k in replacements.keys()}

df = df.rename(columns=replacements) # Rename the columns
df  = df.query(query) # Carry out query

df = df.rename(columns=inv_replacements)

Which amounts to identifying the invalid column names, transforming the query and renaming the columns. Finally we perform the query and then translate the column names back.

Credit to @chrisb for their answer that pointed me in the right direction

This should have been the accepted solution. BTW, since you can chain commands, you could also write `df.rename(...).query(...).rename(....)` to avoid changing the original df at all. — 576i, Oct 17 '18 at 14:40
This should no longer be the accepted solution; the backtick answer now works (in 2022) — Tommy, Apr 05 '22 at 12:15

score 4 · Answer 3 · answered Oct 14 '16 at 14:29

The current implementation of query requires the string to be a valid python expression, so column names must be valid python identifiers. Your two options are renaming the column, or using a plain boolean filter, like this:

df[df['demo$gender'] =='male']

Pandas: query string where column name contains special characters

3 Answers3

Linked