2

based on my previous question

Join two dataframe and expand the dataframe with assign the ID

I have two dataframe looks like this:

 df1
 ID         Time
 ID_E   9:07:00
 ID_E   9:15:00
 ID_B   9:09:00
 ID_B   9:14:00

 df2 
 Time        X        Y
 9:07:00    767597  9427273
 9:08:00    767597  9427284
 9:09:00    767608  9427284
 9:10:00    767608  9427295
 9:11:00    767619  9427306
 9:12:00    767623  9427311
 9:13:00    767628  9427319
 9:14:00    767634  9427327
 9:15:00    767640  9427335

I want to insert a dataframe into another dataframe with Time key with ID appear to follow the Time. However, the results got error right keys must be sorted this because the time in df1 not sorted. How to fix this condition?

I used this code:

df1 = pd.read_csv(data1.csv)
df1.columns = df1.columns.str.strip()
df2 = pd.read_csv(data2.csv)
df2.columns = df2.columns.str.strip()
df1['Time'] = pd.to_datetime(df1['Time'])
df2['Time'] = pd.to_datetime(df2['Time'])
df = pd.merge_asof(df2, df1, on='Time').assign(Time = lambda x: x['Time'].dt.strftime('%H:%M'))
df

My expected result:

  ID    Time         X        Y
  ID_E  9:07:00 767597  9427273
  ID_E  9:08:00 767597  9427284
  ID_E  9:09:00 767608  9427284
  ID_E  9:10:00 767608  9427295
  ID_E  9:11:00 767619  9427306
  ID_E  9:12:00 767623  9427311
  ID_E  9:13:00 767628  9427319
  ID_E  9:14:00 767634  9427327
  ID_E  9:15:00 767640  9427335
  ID_B  9:09:00 767608  9427284
  ID_B  9:10:00 767608  9427295
  ID_B  9:11:00 767619  9427306
  ID_B  9:12:00 767623  9427311
  ID_B  9:13:00 767628  9427319
  ID_B  9:14:00 767634  9427327

thank you for helping me.

Arief Hidayat
  • 937
  • 1
  • 8
  • 19

1 Answers1

1

You get:

right keys must be sorted

Here are right keys column df1.Time, so necessary sorting by this column before merge_asof:

df1 = df1.sort_values('Time')

If error also need sorting by df2.Time:

left keys must be sorted

df2 = df2.sort_values('Time')

All together:

df1 = pd.read_csv(data1.csv)
df1.columns = df1.columns.str.strip()
df2 = pd.read_csv(data2.csv)
df2.columns = df2.columns.str.strip()
df1['Time'] = pd.to_datetime(df1['Time'])
df2['Time'] = pd.to_datetime(df2['Time'])
df1 = df1.sort_values('Time')

df = pd.merge_asof(df2, df1, on='Time').assign(Time = lambda x: x['Time'].dt.strftime('%H:%M'))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your answer. I get another error `('Unknown string format:', '0 days 09:07:00.000000000')`. Maybe because my data became `timedelta`. how about that? – Arief Hidayat Apr 07 '19 at 05:48
  • @jezrael-this code `df1['Time'] = pd.to_datetime(df1['Time'])` – Arief Hidayat Apr 07 '19 at 05:52
  • @Arief - So be free omit `df1['Time'] = pd.to_datetime(df1['Time']) df2['Time'] = pd.to_datetime(df2['Time'])` and also `.assign(Time = lambda x: x['Time'].dt.strftime('%H:%M'))`, it should working well, because timedeltas. – jezrael Apr 07 '19 at 05:53
  • @jezrael-maybe this is a stupid question again. how to convert from timedeltas data example `0 days 09:07:00.000000000' to `09:07:00`. – Arief Hidayat Apr 07 '19 at 06:16
  • if i omit that code. I got this `'NoneType' object is not callable` – Arief Hidayat Apr 07 '19 at 06:17
  • 1
    @Arief - Not stupid question. But I have idea - first how working `df1['Time'] = pd.to_timedelta(df1['Time']) df2['Time'] = pd.to_timedelta(df2['Time']) df1 = df1.sort_values('Time') df = pd.merge_asof(df2, df1, on='Time')` ? – jezrael Apr 07 '19 at 06:19
  • @Arief - then in output if need convert timedeltas to strings use [this solution](https://stackoverflow.com/a/51102096/2901002) – jezrael Apr 07 '19 at 06:21