I am using huge dataset with 5 columns and more that 90 million rows. The code works fine with part of the data, but when it comes to the whole I get Memory Error. I read about generators, but it appears very complex for me. Can I get explanation based on this code?
df = pd.read_csv('D:.../test.csv', names=["id_easy","ordinal", "timestamp", "latitude", "longitude"])
df = df[:-1]
df.loc[:,'timestamp'] = pd.to_datetime(df.loc[:,'timestamp'])
pd.set_option('float_format', '{:f}'.format)
df['epoch'] = df.loc[:, 'timestamp'].astype('int64')//1e9
df['day_of_week'] = pd.to_datetime(df['epoch'], unit="s").dt.weekday_name
del df['timestamp']
for day in ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']:
day_df = df.loc[df['day_of_week'] == day]
day_df.to_csv(f'{day}.csv', index=False,)
Error appears on the last for loop
operation
Sample data:
d4ace40905729245a5a0bc3fb748d2b3 1 2016-06-01T08:18:46.000Z 22.9484 56.7728
d4ace40905729245a5a0bc3fb748d2b3 2 2016-06-01T08:28:05.000Z 22.9503 56.7748
UPDATED
I did this:
chunk_list = []
for chunk in df_chunk:
chunk_list.append(chunk)
df_concat = pd.concat(chunk_list)
I have no idea how to proceed now? How to apply the rest of the code?