I'm working with 8 different csv files. In a first step I've cleaned each file in this way:
1) file observations
obs = pd.read_csv('....csv',sep = ";")
obs = asport_obs.drop(['date', 'humidity', 'precipitation', 'station'],axis=1).dropna()
obs['time'] = obs['time'].astype('datetime64[ns]')
obs['time'] = asport_obs['time'].apply(lambda x: x.strftime('%d-%m-%Y %H'))
asport.columns = ['temperature_obs','time']
2) files previsions
prev = pd.read_csv('....csv',sep = ";")
prev = prev.drop(['cloud_cover', 'date', 'humidity', 'latitude_r', 'longitude_r', 'pressure', 'wind', 'wind_dir'],axis=1).dropna()
prev['time'] = prev['time'].astype('datetime64[ns]')
prev['time'] = prev['time'].apply(lambda x: x.strftime('%d-%m-%Y %H'))
prev.columns = ['temperature_prev','time']
prev2 = pd.read_csv('....csv',sep = ";")
prev2 = prev2.drop(['cloud_cover', 'date', 'humidity', 'latitude_r', 'longitude_r', 'pressure', 'wind', 'wind_dir'],axis=1).dropna()
prev2['time'] = prev2['time'].astype('datetime64[ns]')
prev2['time'] = prev2['time'].apply(lambda x: x.strftime('%d-%m-%d-%Y %H'))
prev2.columns = ['temperature_prev2','time']
...
same for 5 others files previsions
In a second time I've merged all these files by the key "time" on this format "day-month-year hour" on the left (file obs) in this way :
prevs = pd.merge(obs,prev[['time', 'temperature_apiagro']], how='left',on='time')
prevs = pd.merge(prevs,prevs2[['time', 'temperature_darksky']], how='left',on='time')
... and so on
The final size of the merged file is approximately 42 million of rows. The process to obtained the merged file is very long and now each time I try to run it, the algorithm /python breaks.
I would like to know if there is/are solution.s to optimize my code in order that its launch be faster and work without break