2

I have two data frames created by someone that I need to merge_asof on datetime and id. Left data frame was created like this:

import pandas as pd
import pytz
from datetime import datetime
from datetime import timezone

dates = [datetime(2020, 1, 2, 8, 0, 0, 824000),
         datetime(2020, 1, 8, 6, 2, 52, 833000),
         datetime(2020, 1, 9, 22, 41, 18, 858000),
         datetime(2020, 1, 16, 8, 0, 1, 404000),
         datetime(2020, 1, 22, 8, 0, 1, 560000),
         datetime(2020, 1, 23, 8, 0, 1, 493000)
        ]
timezone = pytz.timezone('US/Eastern')
dates_localized = [timezone.localize(d) for d in dates ]
ids = [1,1,1,2,2,2]
headlines = ['abc','def','jkl', 'mno','pqr', 'stx']
left = pd.DataFrame({'date':dates_localized, 'id':ids, 'headlines':headlines})
print(left)

                              date  id headlines
0 2020-01-02 08:00:00.824000-05:00   1       abc
1 2020-01-08 06:02:52.833000-05:00   1       def
2 2020-01-09 22:41:18.858000-05:00   1       jkl
3 2020-01-16 08:00:01.404000-05:00   2       mno
4 2020-01-22 08:00:01.560000-05:00   2       pqr
5 2020-01-23 08:00:01.493000-05:00   2       stx

Right data frame was created similar to this:

index = pd.DatetimeIndex(['2020-01-02 07:30:00.070041845',
               '2020-01-08 05:30:00.167110660',
               '2020-01-09 09:30:00.185073458',
               '2020-01-16 09:30:00.190448059',
               '2020-01-22 07:30:00.286648287',
               '2020-01-22 06:30:00.376308078'])

right = pd.DataFrame({'id':[1,1,1,2,2,2], 'value':[1,0,0,1,1,0]})
right = right.set_index(index)
right.index.name = 'date'
print(right)

                               id  value
date                                    
2020-01-02 07:30:00.070041845   1      1
2020-01-08 05:30:00.167110660   1      0
2020-01-09 09:30:00.185073458   1      0
2020-01-16 09:30:00.190448059   2      1
2020-01-22 07:30:00.286648287   2      1
2020-01-22 06:30:00.376308078   2      0

Merge:

df = pd.merge_asof(left, right, on='date', by='id')

results in error:

MergeError: incompatible merge keys [1] datetime64[ns, US/Eastern] and dtype('<M8[ns]'), must be the same type

Any ideas how to convert times to one type that merge_asof could be done?

dokondr
  • 3,389
  • 12
  • 38
  • 62

1 Answers1

3

One idea is use DataFrame.tz_localize for set timezones to Datetimeindex:

df = pd.merge_asof(left, right.tz_localize('US/Eastern').sort_index(), on='date', by='id')
print (df)
                              date  id headlines  value
0 2020-01-02 08:00:00.824000-05:00   1       abc    1.0
1 2020-01-08 06:02:52.833000-05:00   1       def    0.0
2 2020-01-09 22:41:18.858000-05:00   1       jkl    0.0
3 2020-01-16 08:00:01.404000-05:00   2       mno    NaN
4 2020-01-22 08:00:01.560000-05:00   2       pqr    1.0
5 2020-01-23 08:00:01.493000-05:00   2       stx    1.0

EDIT: If necessary set timezone to date column:

left['date'] = left['date'].dt.tz_localize('US/Eastern')
df = pd.merge_asof(left, right.sort_index(), on='date', by='id')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • On real data I get this error: `MergeError: incompatible merge keys [1] dtype('O') and datetime64[ns, US/Eastern], must be the same type`. Maybe this is because of: left and right dataframe have different type of `date` (column / index) : `(datetime.datetime, pandas.core.indexes.datetimes.DatetimeIndex)` – dokondr Apr 22 '20 at 15:45
  • @dokondr - hmmm, it seems there is list of `datetime.datetime` in index, not datetimeindex, so error `dtype('O')` is possible use `df1.index = pd.to_datetime(df1.index)` ? – jezrael Apr 22 '20 at 15:47
  • In df1 (left) `date` is not an index, it is a column with values of `datatime.datatime` type. Converting `df1[date] = pd.to_datetime(df1[date])` requries `utc=True` as an argument, which makes wrong date and time, I think. – dokondr Apr 22 '20 at 16:04
  • Merge then results in error `incompatible merge keys [1] datetime64[ns, UTC] and datetime64[ns, US/Eastern], must be the same type` – dokondr Apr 22 '20 at 16:07
  • @dokondr - hmmm, is possible convert column to timezones, like edited question? – jezrael Apr 22 '20 at 16:11
  • 1
    I tried, and get ` Not naive datetime (tzinfo is already set)` Maybe make it `naive` first and then set timezone? – dokondr Apr 22 '20 at 16:14
  • 1
    @dokondr - Use [this](https://stackoverflow.com/a/34687479/2901002), only if need convert column is necessary `.dt` like `left['date'] = left['date'].dt.tz_localize(None).dt.tz_localize('US/Eastern')` – jezrael Apr 22 '20 at 16:19