Did python or Pandas cache results?

Question

One of the strangest thing i noticed while working with pandas DataFrame. there is a drastic reduction in time to create a DataFrame between 1st and 2nd run of same code.

L = list('ABCDEFGH')*20000
min_length = 10000
data_dict = {k: np.random.randint(10, size=min_length) for k in L}
start = time.time()
df = pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()})
print('loop time : ', time.time() - start)

Time for 1st run

loop time : 0.05926999

when i re run the above code

loop time : 0.00090622

Can any body explain what just happened?
Did pandas or python cache results?
if you timeit in ipython will get result like this

I think you should also tag this with iPython. It's interesting because I haven't actually seen that message on `timeit` before so I'm not sure which part it would cache (and no idea how you would trace it), but I can reproduce your result. — roganjosh, May 10 '17 at 17:59
Could it just be the first time the .pyc file hasn't been compiled, and every time after it has? What if you create a 10x loop, but run the code only once? — elPastor, May 11 '17 at 11:49

score 0 · Answer 1 · answered May 11 '17 at 12:45

0

My point was that it may be an issue of the first run taking the time to convert to a .pyc file for run-time. I'm really no expert, and this really isn't an answer, but more of a troubleshooting step.

Try running this and see if the first iteration is materially longer than the subsequent iterations.

L = list('ABCDEFGH')*20000
min_length = 10000
data_dict = {k: np.random.randint(10, size=min_length) for k in L} 

for i in range(10):
    start = time.time() 
    df = pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()})
    print('loop time : ', time.time() - start)

answered May 11 '17 at 12:45

elPastor

8,435
11
53
81

I just run the above code. here is my output `loop time : 0.03665304183959961 loop time : 0.0012547969818115234 loop time : 0.0006182193756103516 . loop time : 0.0004937648773193359 loop time : 0.0005068778991699219 loop time : 0.0005292892456054688` – John May 11 '17 at 17:40
time goes on improving and saturates at some value. – John May 11 '17 at 17:46
Wish I could help, but the reasoning behind that is way over my pay-grade! Good luck in your search. – elPastor May 11 '17 at 23:24

Did python or Pandas cache results?

1 Answers1