I read a bunch of pickle files with the below code, I want to loop through and get each of these, identify the length of each file. Ie how many records.
Two issues:
- Concat will combine all my dfs into one, which takes a long time. Anyone to just read the len?
- If Concat is the way to go, how can I get the length of each file if they all go into one dataframe? I guess the problem is here to identify where each file stops and starts. I could add a column to identify each filename and count there I suspect.
What ive tried:
import pandas as pd
import glob, os
files = glob.glob('O:\Stack\Over\Flow\*.pkl')
df = pd.concat([pd.read_pickle(fp, compression='xz').assign(New=os.path.basename(fp)) for fp in files])
Any help would be appreciated.