Why does the pandas data access seem to work 1000 times slower when it's executed inside an eval()
?
Example:
import pandas as pd
import numpy as np
import time
df = pd.DataFrame(np.array([[1, 2, 3]]), columns=['a', 'b', 'c'])
Without eval():
tic = time.perf_counter()
for i in range(100):
df['a']
toc = time.perf_counter()
print(f"Executed in {toc - tic:0.4f} seconds")
#Executed in 0.0002 seconds
Using pd.eval:
tic = time.perf_counter()
for i in range(100):
pd.eval("df['a']")
toc = time.perf_counter()
print(f"Executed in {toc - tic:0.4f} seconds")
#Executed in 0.9919 seconds
Using loc
and iloc
doesn't make a difference.
The performance is also largely the same when accessing more elements.
df['a'] + df['b']
- 0.0115 seconds vs 1.8803 secondsdf['a'] + df['b']+ df['c']
- 0.0259 seconds vs 3.0085 seconds
Using eval()
directly on the dataframe, e.g. df.eval('a = b + c')
, makes it only 3-4 times slower than the non-eval version (as opposed to 100-1000 times slower), but the original question remains.