61

I have a dataframe with spaces in column names. I am trying to use query method to get the results. It is working fine with 'c' column but getting error for 'a b'

import pandas as pd
a = pd.DataFrame(columns=["a b", "c"])
a["a b"] = [1,2,3,4]
a["c"] = [5,6,7,8]
a.query('a b==5')

For this I am getting this error:

a b ==5
  ^
SyntaxError: invalid syntax

I don't want to fill up space with other characters like '_' etc.

There is one hack using pandasql to put variable name inside brackets example: [a b]

jpp
  • 159,742
  • 34
  • 281
  • 339
Bhushan Pant
  • 1,445
  • 2
  • 13
  • 29

6 Answers6

92

From pandas 0.25 onward you will be able to escape column names with backticks so you can do

a.query('`a b` == 5') 
cs95
  • 379,657
  • 97
  • 704
  • 746
Jarno
  • 6,243
  • 3
  • 42
  • 57
35

Pandas 0.25+

As described here:

DataFrame.query() and DataFrame.eval() now supports quoting column names with backticks to refer to names with spaces (GH6508)

So you can use:

a.query('`a b`==5')

Pandas pre-0.25

You cannot use pd.DataFrame.query if you have whitespace in your column name. Consider what would happen if you had columns named a, b and a b; there would be ambiguity as to what you require.

Instead, you can use pd.DataFrame.loc:

df = df.loc[df['a b'] == 5]

Since you are only filtering rows, you can omit .loc accessor altogether:

df = df[df['a b'] == 5]
jpp
  • 159,742
  • 34
  • 281
  • 339
2

It is not possible yet. Check github issue #6508:

Note that in reality .query is just a nice-to-have interface, in fact it has very specific guarantees, meaning its meant to parse like a query language, and not a fully general interface.

Reason is for query need string to be a valid python expression, so column names must be valid python identifiers.

Solution is boolean indexing:

df = df[df['a b'] == 5]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I am afraid that the query method does not accept column name with empty space. In any case you can query the dataframe in this way:

import pandas as pd
a = pd.DataFrame({'a b':[1,2,3,4], 'c':[5,6,7,8]})
a[a['a b']==1]
DTT
  • 39
  • 5
1

Easier way:

df[
    (df['column name 1'] == 10)
    & (df['column name 2'] > 4)
]
carsalves
  • 11
  • 3
0

Instead of using the pandas.query function I would create a condition in this case to lookup values and where the condition is True. For example:

import pandas as pd
a = pd.DataFrame(columns=["a b", "c"])
a["a b"] = [1,2,3,5]
a["c"] = [5,6,7,8]
#a.query('a b==5') Remove the query because it cannot lookup columns with spaces in the name.

condition = a['a b'] == 5
print(a['a b'][condition])
output:

    3    5

We see that at index 3 your condition evaluates to True (if you want the specific index and not Series of Boolean values).

stephen_mugisha
  • 889
  • 1
  • 8
  • 18
Simeon Ikudabo
  • 2,152
  • 1
  • 10
  • 27