One of the strangest thing i noticed while working with pandas DataFrame. there is a drastic reduction in time to create a DataFrame between 1st and 2nd run of same code.
L = list('ABCDEFGH')*20000
min_length = 10000
data_dict = {k: np.random.randint(10, size=min_length) for k in L}
start = time.time()
df = pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()})
print('loop time : ', time.time() - start)
Time for 1st run
loop time : 0.05926999
when i re run the above code
loop time : 0.00090622
Can any body explain what just happened?
Did pandas or python cache results?
if you timeit in ipython will get result like this