I am merging two dataframes, both have a Timestamp format DD.MM.YYYY. But when I merge them, some rows have format MM.DD.YYYY and others DD.MM.YYYY.
The dataframes I'm using are being exported from csv files and are really big, but here's an example:
df_dts1:
dict = {'date' : ['01.07.2022 00:01:00', '24.07.2022 22:46:00', '09.08.2022 15:14:00', '06.09.2022 18:45:00'],
'CST_dts_eCO2_FC_out_SGP30' : [469, 1167, 5638, 651],
'CST_dts_IAQ_FC_in_BME680' : [196, 39, 25, 186],
'CST_dts_eCO2_FC_out_BME680' : [891, 12690, 6238, 1701]}
df_dts1 = pd.DataFrame(dict)
display(df_dts1)
date | CST_dts_eCO2_FC_out_SGP30 | CST_dts_IAQ_FC_in_BME680 | CST_dts_eCO2_FC_out_BME680 |
---|---|---|---|
01.07.2022 00:01:00 | 469 | 196 | 891 |
24.07.2022 22:46:00 | 1167 | 39 | 12690 |
09.08.2022 15:14:00 | 5638 | 25 | 6238 |
06.09.2022 18:45:00 | 651 | 186 | 1701 |
df_OxyMode:
dict = {'LOCALTIME' : ['01.07.2022 02:00:04', '20.08.2022 02:00:09', '02.09.2022 02:00:39', '01.10.2022 01:59:56'],
'x_LowOxyRegeneration_SPS' : [0.0, 0.0, 1.0, 0.0]}
df_OxyMode = pd.DataFrame(dict)
display(df_OxyMode)
LOCALTIME | x_LowOxyRegeneration_SPS |
---|---|
01.07.2022 02:00:04 | 0.0 |
20.08.2022 02:00:09 | 0.0 |
02.09.2022 02:00:39 | 1.0 |
01.10.2022 01:59:56 | 0.0 |
I am currently trying this:
frames = [df_dts1, df_OxyMode]
# rename columns to have same name
df_dts1.rename(columns={'date':'Timestamp'}, inplace=True)
df_OxyMode.rename(columns={'LOCALTIME':'Timestamp'}, inplace=True)
# datetime format to avoid error (incompatible merge keys [0] dtype('<M8[ns]') and dtype('O'), must be the same type)
df_dts1['Timestamp'] = pd.to_datetime(df_dts1['Timestamp'])
df_OxyMode['Timestamp'] = pd.to_datetime(df_OxyMode['Timestamp'])
# sorting to avoid error (ValueError: left keys must be sorted)
df_dts1 = df_dts1.sort_values(by='Timestamp', ascending=True)
df_OxyMode = df_OxyMode.sort_values(by='Timestamp', ascending=True)
# merge dfs
df_ausw = pd.merge_asof(df_dts1, df_OxyMode, on='Timestamp').assign(Timestamp = lambda x: x['Timestamp'].dt.strftime('%d-%m-%Y %H:%M'))
df_ausw = df_ausw.sort_values(by='Timestamp', ascending=True)
display(df_ausw)
but the format of the Timestamp column is inconsistent (some rows MM.DD.YYYY and others DD.MM.YYYY) and looks like this: merged df
At first I had it without the
.assign(Timestamp = lambda x: x['Timestamp'].dt.strftime('%d-%m-%Y %H:%M'))
but when I plotted the data it looked wrong: merged df without .assign(Timestamp = lambda... instead of: df before merging
Does someone have any idea how I can solve this?