Data access with Pandas eval() is orders of magnitude slower than regular pandas

Question

Why does the pandas data access seem to work 1000 times slower when it's executed inside an eval()?

Example:

import pandas as pd
import numpy as np
import time

df = pd.DataFrame(np.array([[1, 2, 3]]), columns=['a', 'b', 'c'])

Without eval():

tic = time.perf_counter()

for i in range(100):
    df['a']
    
toc = time.perf_counter()
print(f"Executed in {toc - tic:0.4f} seconds")
#Executed in 0.0002 seconds

Using pd.eval:

tic = time.perf_counter()

for i in range(100):
    pd.eval("df['a']")
    
toc = time.perf_counter()
print(f"Executed in {toc - tic:0.4f} seconds")
#Executed in 0.9919 seconds

Using loc and iloc doesn't make a difference.

The performance is also largely the same when accessing more elements.

df['a'] + df['b'] - 0.0115 seconds vs 1.8803 seconds
df['a'] + df['b']+ df['c'] - 0.0259 seconds vs 3.0085 seconds

Using eval() directly on the dataframe, e.g. df.eval('a = b + c'), makes it only 3-4 times slower than the non-eval version (as opposed to 100-1000 times slower), but the original question remains.

Please use textual code rather than pictures that cannot be copied-pasted or edited. What about using `eval` and not `pd.eval`? — Jérôme Richard, Aug 05 '20 at 20:09
related question: https://stackoverflow.com/questions/38725355/when-to-use-dataframe-eval-versus-pandas-eval-or-python-eval — C8H10N4O2, Aug 05 '20 at 21:06

Data access with Pandas eval() is orders of magnitude slower than regular pandas

0 Answers0