To create a large pandas DataFrame (where each entry in the dataframe is a float and the data and dataframes are on the order of 30,000 rows and and tens of columns), from a dictionary can be done in a short amount of time by calling:
import pandas as pd
df = pd.DataFrame(my_dict)
This df object is created very quickly (about 0.05 seconds).
Additionally, saving and recalling the data frame using to_pickle and read_pickle can be done quickly.
df.to_pickle(save_path) # takes ~2.5 seconds
reloaded_df = pd.read_pickle(save_path) # takes 0.1 seconds
However, when I try to do any operations on reloaded_df, it takes an unreasonable amount of time and memory. For example, calling:
reloaded_df.head() # Takes many minutes to run and uses a lot of RAM.
Why is reloading the data frame so quick, but operating on it take so long? Also, what would be a work-around so calling reloaded_df.head() returns quickly after reloading the data frame?
The question How to store a dataframe using Pandas does not address my question because they do not discuss the delay in using the pandas dataframe after reloading it from a pickle file.
I am using python 3.5, pandas version 0.22 and Windows 10.