8

I was looking at pandas DataFrame eval method (docs) which I find a nice syntactic sugar and could also help enhancing performances.

This is the example from the docs:

from numpy.random import randn
import pandas as pd

df = pd.DataFrame(randn(10, 2), columns=list('ab'))
df.eval('a + b')

How can I use eval when there is a space in my column names? Example:

df = pd.DataFrame(randn(10, 2), columns=["Col 1", "Col 2"])

I tried this:

df.eval('"Col 1" + "Col 2"')

but this gives error:

TypeError: data type "Col 1" not understood
FLab
  • 7,136
  • 5
  • 36
  • 69
  • 1
    Since it is not a built-in method, *DataFrame* needs to be qualified per rules of Python. You might have meant `pd.DateFrame` or `from pandas import DataFrame`? – Parfait Jul 27 '17 at 14:21
  • @cs95 how can this be a duplicated of a question asked one year after this one? – FLab May 20 '19 at 08:35
  • 1
    Duplicates don't have to be asked in chronological order. I closed this because there's an answer there explaining how you can use backticks to support spaces in 0.25. – cs95 May 20 '19 at 12:06
  • thanks for the clarification – FLab May 20 '19 at 14:26

3 Answers3

5
pd.eval('df["Col 1"] + df["Col 2"]')

This keeps the argument to eval as a string but is less clean than the example without spaces in the column names

example:

print(df)

      Col 1     Col 2
0 -0.206838 -1.007173
1 -0.762453  1.178220
2 -0.431943 -0.804775
3  0.830659 -0.244472
4  0.111637  0.943254
5  0.206615  0.436250
6 -0.568307 -0.680140
7 -0.127645 -0.098351
8  0.185413 -1.224999
9  0.767931  1.512654

print(pd.eval('df["Col 1"] + df["Col 2"]'))

0   -1.214011
1    0.415768
2   -1.236718
3    0.586188
4    1.054891
5    0.642865
6   -1.248447
7   -0.225995
8   -1.039586
9    2.280585
dtype: float64

EDIT

After some investigation, it looks like the above method works in either python 2.7 or 3.6 if you are using the python engine:

pd.eval('df["Col 1"] + df["Col 2"]', engine='python')

However, this does not give you the performance advantage that the numexpr engine can provide. In python 2.7, this method works:

pd.eval('df["Col 1"] + df["Col 2"]', engine='numexpr')  

but in python 3.6 you get the error ValueError: unknown type str160.

My guess is that this is because pandas is passing a unicode string to numexpr in 3.6 but a bytestring in 2.7 . I'm guessing that this problem is related to this issue and maybe this one as well.

bunji
  • 5,063
  • 1
  • 17
  • 36
  • did you test it? It gives me `ValueError: unknown type str160` (Pandas 0.20.1). What is your Pandas version? – MaxU - stand with Ukraine Jul 27 '17 at 12:52
  • @MaxU, I believe so... (see the example). Are there further tests you would recommend? – bunji Jul 27 '17 at 12:54
  • Could you specify your Pandas version? I'd like to test it myself... – MaxU - stand with Ukraine Jul 27 '17 at 12:58
  • @MaxU It works for me in two versions: 0.19.2 (on python 2.7) and 0.20.3 (on python 3.6). Also, I'm using numpy versions 1.11.1 and 1.12.0 respectively – bunji Jul 27 '17 at 13:05
  • 2
    @MaxU Use `engine='python'`. – Stop harming Monica Jul 27 '17 at 13:08
  • @bunji, i got it! Most probably you don't have `numexpr` installed, so `pd.eval` falls back to `python` engine as Goyo just written – MaxU - stand with Ukraine Jul 27 '17 at 13:14
  • @Goyo, yeah, i think this is it. If `numexpr` is not installed, `pd.eval()` falls back to `engine='python'` – MaxU - stand with Ukraine Jul 27 '17 at 13:14
  • 1
    @MaxU yup. I just tried it on another machine with numexpr installed and got the same error. Good catch. This is really too bad since numexpr is what's giving the performance advantage in the first place. – bunji Jul 27 '17 at 13:16
  • 1
    @MaxU It looks like the error is caused because pandas passes a unicode string to numexpr. So, interestingly, with python 2.7 this works using the numexpr engine since it is passing bytestrings which numexpr can handle. – bunji Jul 27 '17 at 13:25
  • @bunji, this is an interesting finding! Can you extend your answer? This might help people having similar problem... – MaxU - stand with Ukraine Jul 27 '17 at 13:27
  • 2
    @bunji, moreover - if we replace spaces with underscores in column names - `df.eval("Col_1 + Col_2")` works in Python 2.7 and it does NOT work in Python 3.6... – MaxU - stand with Ukraine Jul 27 '17 at 13:30
  • @MaxU Ok, that is weird. `df.eval('Col_1 + Col_2', engine='numexpr')` works for me in both 3.6 and 2.7 – bunji Jul 27 '17 at 13:40
  • Might be too late to the party but according to Pandas doc : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.eval.html engine='python' seems to be bad choice. As per the doc : "level python. This engine is generally not that useful." I was looking it up for a different issue and came across this question. – Parachute May 04 '18 at 14:58
  • 1
    For those that actually want to know the answer, look in the question that shows the "duplicate" question: use backticks. What should work is ``df.eval('`col 1` + `col 2`')`` – PrinsEdje80 Apr 22 '20 at 07:10
3

You can do this using:

df.eval(df["Col 1"] + df["Col 2"])

But that is kind of going against the purpose of the eval function.

Alternatively, you can rename your columns in order to make them compatible with the eval syntax:

df.columns = df.columns.map(lambda x: x.replace(' ', '_'))
Thundzz
  • 675
  • 1
  • 11
  • 15
0

thank you @Thundzz

    df.columns = df.columns.map(lambda x: x.replace(' ', '_'))

this snippet works well!

yts61
  • 1,142
  • 2
  • 20
  • 33
  • This doesn't actually answer the question; instead, it ignores the question and resolves it by interspacing underscores. – ifly6 Jul 23 '19 at 17:56