0

Why does the pandas data access seem to work 1000 times slower when it's executed inside an eval()?

Example:

import pandas as pd
import numpy as np
import time

df = pd.DataFrame(np.array([[1, 2, 3]]), columns=['a', 'b', 'c'])

Without eval():

tic = time.perf_counter()

for i in range(100):
    df['a']
    
toc = time.perf_counter()
print(f"Executed in {toc - tic:0.4f} seconds")
#Executed in 0.0002 seconds

Using pd.eval:

tic = time.perf_counter()

for i in range(100):
    pd.eval("df['a']")
    
toc = time.perf_counter()
print(f"Executed in {toc - tic:0.4f} seconds")
#Executed in 0.9919 seconds

Using loc and iloc doesn't make a difference.

The performance is also largely the same when accessing more elements.

  • df['a'] + df['b'] - 0.0115 seconds vs 1.8803 seconds
  • df['a'] + df['b']+ df['c'] - 0.0259 seconds vs 3.0085 seconds

Using eval() directly on the dataframe, e.g. df.eval('a = b + c'), makes it only 3-4 times slower than the non-eval version (as opposed to 100-1000 times slower), but the original question remains.

typhon04
  • 2,350
  • 25
  • 22
  • Please use textual code rather than pictures that cannot be copied-pasted or edited. What about using `eval` and not `pd.eval`? – Jérôme Richard Aug 05 '20 at 20:09
  • related question: https://stackoverflow.com/questions/38725355/when-to-use-dataframe-eval-versus-pandas-eval-or-python-eval – C8H10N4O2 Aug 05 '20 at 21:06

0 Answers0