2

One of the strangest thing i noticed while working with pandas DataFrame. there is a drastic reduction in time to create a DataFrame between 1st and 2nd run of same code.

L = list('ABCDEFGH')*20000
min_length = 10000
data_dict = {k: np.random.randint(10, size=min_length) for k in L}
start = time.time()
df = pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()})
print('loop time : ', time.time() - start)

Time for 1st run

loop time : 0.05926999

when i re run the above code

loop time : 0.00090622

Can any body explain what just happened?
Did pandas or python cache results?
if you timeit in ipython will get result like this

Community
  • 1
  • 1
John
  • 1,212
  • 1
  • 16
  • 30
  • Is your program stored in a file? – JohanL May 10 '17 at 17:39
  • yes! is it make any difference? – John May 10 '17 at 17:40
  • I think you should also tag this with iPython. It's interesting because I haven't actually seen that message on `timeit` before so I'm not sure which part it would cache (and no idea how you would trace it), but I can reproduce your result. – roganjosh May 10 '17 at 17:59
  • Could it just be the first time the .pyc file hasn't been compiled, and every time after it has? What if you create a 10x loop, but run the code only once? – elPastor May 11 '17 at 11:49
  • @pshep123 I didn't gets u? will you please explain it? – John May 11 '17 at 12:19

1 Answers1

0

My point was that it may be an issue of the first run taking the time to convert to a .pyc file for run-time. I'm really no expert, and this really isn't an answer, but more of a troubleshooting step.

Try running this and see if the first iteration is materially longer than the subsequent iterations.

L = list('ABCDEFGH')*20000
min_length = 10000
data_dict = {k: np.random.randint(10, size=min_length) for k in L} 

for i in range(10):
    start = time.time() 
    df = pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()})
    print('loop time : ', time.time() - start)
elPastor
  • 8,435
  • 11
  • 53
  • 81
  • I just run the above code. here is my output `loop time : 0.03665304183959961 loop time : 0.0012547969818115234 loop time : 0.0006182193756103516 . loop time : 0.0004937648773193359 loop time : 0.0005068778991699219 loop time : 0.0005292892456054688` – John May 11 '17 at 17:40
  • time goes on improving and saturates at some value. – John May 11 '17 at 17:46
  • Wish I could help, but the reasoning behind that is way over my pay-grade! Good luck in your search. – elPastor May 11 '17 at 23:24