At my company the sales data per month is stored in a folder as a CVS file. To already speed up to reading process in Python I am transforming the CSV files to Pickle files. Right now I have the following code to read all the individual pickle files and append them together in the dataframe:
import os, glob
import pandas as pd
import glob
import os.path
# Enter path of folder#
path = "link to the folder"
# find all pickle files
all_files = glob.glob(path + "/*.pkl")
df = pd.concat(
(pd.read_pickle(file).assign(filename=file) for file in all_files),
ignore_index=True,
)
I have 38 individual pickle files and the total size of the pickle files are 95 MB. This doesn't seem like a lot to me, but still it takes 56s to load all data into the dataframe.
Is there anything that can speed up this proces? Many thanks in advance!
Best, Kav