15

I'm working in Python Pandas with a dataframe that got its column names prepended with Content.. I can access a given column by stating df['Content.xyz']. However, when I try to perform queries on it, e.g. df.query("Content.xyz not in @mylist"), it throws an error that Content is not a member of the dataframe.

How can I perform a query or other similar operations with a period prepended in the name?

Also, some of the series names have spaces in them. I'm assuming the solution for a column name with a period would be similar to a solution for a name containing a space.

accdias
  • 5,160
  • 3
  • 19
  • 31
Michael James
  • 492
  • 1
  • 6
  • 19

2 Answers2

13

From the .query() docs:

New in version 0.25.0.

You can refer to column names that contain spaces by surrounding them in backticks.

For example, if one of your columns is called a a and you want to sum it with b, your query should be `a a` + b.

So that answers the second part of your question; you can use backticks around the column name to escape whitespaces in its name.

Unfortunately this only works for spaces right now and not yet for dots or other special characters. It is currently an open issue which is being worked on (https://github.com/pandas-dev/pandas/issues/27017) and might be fixed soon in a next release.

Community
  • 1
  • 1
gosuto
  • 5,422
  • 6
  • 36
  • 57
2

You cannot use the df.Content.xyz notation to access the column. You can only reference the columns using df['Content.xyz']

df = pd.DataFrame([1,2], columns = ['Content.xyz'])
print(df['Content.xyz'])

0    1
1    2
Brandon
  • 918
  • 6
  • 14