0

I am merging two dataframes, both have a Timestamp format DD.MM.YYYY. But when I merge them, some rows have format MM.DD.YYYY and others DD.MM.YYYY.

The dataframes I'm using are being exported from csv files and are really big, but here's an example:

df_dts1:

dict = {'date' : ['01.07.2022 00:01:00', '24.07.2022 22:46:00', '09.08.2022 15:14:00', '06.09.2022 18:45:00'],
'CST_dts_eCO2_FC_out_SGP30' : [469, 1167, 5638, 651],
'CST_dts_IAQ_FC_in_BME680' : [196, 39, 25, 186],
'CST_dts_eCO2_FC_out_BME680' : [891, 12690, 6238, 1701]}

df_dts1 = pd.DataFrame(dict)
display(df_dts1)
date CST_dts_eCO2_FC_out_SGP30 CST_dts_IAQ_FC_in_BME680 CST_dts_eCO2_FC_out_BME680
01.07.2022 00:01:00 469 196 891
24.07.2022 22:46:00 1167 39 12690
09.08.2022 15:14:00 5638 25 6238
06.09.2022 18:45:00 651 186 1701

df_OxyMode:

dict = {'LOCALTIME' : ['01.07.2022 02:00:04', '20.08.2022 02:00:09', '02.09.2022 02:00:39', '01.10.2022 01:59:56'],
'x_LowOxyRegeneration_SPS' : [0.0, 0.0, 1.0, 0.0]}

df_OxyMode = pd.DataFrame(dict)
display(df_OxyMode)
LOCALTIME x_LowOxyRegeneration_SPS
01.07.2022 02:00:04 0.0
20.08.2022 02:00:09 0.0
02.09.2022 02:00:39 1.0
01.10.2022 01:59:56 0.0

I am currently trying this:

frames = [df_dts1, df_OxyMode]

# rename columns to have same name
df_dts1.rename(columns={'date':'Timestamp'}, inplace=True)
df_OxyMode.rename(columns={'LOCALTIME':'Timestamp'}, inplace=True)

# datetime format to avoid error (incompatible merge keys [0] dtype('<M8[ns]') and dtype('O'), must be the same type)
df_dts1['Timestamp'] = pd.to_datetime(df_dts1['Timestamp'])
df_OxyMode['Timestamp'] = pd.to_datetime(df_OxyMode['Timestamp'])

# sorting to avoid error (ValueError: left keys must be sorted)
df_dts1 = df_dts1.sort_values(by='Timestamp', ascending=True)
df_OxyMode = df_OxyMode.sort_values(by='Timestamp', ascending=True)

# merge dfs 
df_ausw = pd.merge_asof(df_dts1, df_OxyMode, on='Timestamp').assign(Timestamp = lambda x: x['Timestamp'].dt.strftime('%d-%m-%Y %H:%M'))
df_ausw = df_ausw.sort_values(by='Timestamp', ascending=True)

display(df_ausw)

but the format of the Timestamp column is inconsistent (some rows MM.DD.YYYY and others DD.MM.YYYY) and looks like this: merged df

At first I had it without the

.assign(Timestamp = lambda x: x['Timestamp'].dt.strftime('%d-%m-%Y %H:%M'))

but when I plotted the data it looked wrong: merged df without .assign(Timestamp = lambda... instead of: df before merging

Does someone have any idea how I can solve this?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
snkm
  • 3
  • 3
  • Force the format in `pd.to_datetime` with `format='%Y-%m-%d'` – mozway Dec 29 '22 at 14:08
  • thanks for your suggestion @mozway, but sadly it didn't change anything :( – snkm Dec 29 '22 at 14:28
  • Then please provide a reproducible example. – mozway Dec 29 '22 at 14:28
  • If you do not specify a format for `pd.to_datetime`, it will assume the month is listed first. That does not seem to be true for your input. So use e.g. `pd.to_datetime(df_dts1['Timestamp'], dayfirst=True)`. – FObersteiner Dec 30 '22 at 08:41
  • @mozway I tried again and it worked :) I just had to put also the time format. Thanks! – snkm Dec 30 '22 at 08:50
  • @FObersteiner I also tried your suggestion and it worked instantely, thank you! – snkm Dec 30 '22 at 08:52
  • I also had the problem that the dates where not being sorted out correctly. But the problem got solved by adding `df_asw.sort_index(inplace=True)` – snkm Dec 30 '22 at 08:55

0 Answers0