I have a 14 million row CSV file, with a date column (not the first column) that I want to filter and split the data by.
Currently, I am loading it into pandas dataframe to do it:
df = pd.read_csv(filepath, dtype=str)
for date in df['dates'].unique():
subset = df[df['dates'] == date]
subset.to_csv(date + dest_path)
Is there a faster way to do this?
Filter out rows from CSV before loading to pandas dataframe gives an interesting solution but unfortunately my column to split by is not in the first column.
EDIT:
I purely need to split the csv files into each date. The resulting csv files are passed on to another team. I need all the columns, I do not want to change any data, I do not need to do any groupby.