I'm trying to join two pandas.DataFrames
on a datetime64[ns, UTC]
field and it's failing with a ValueError
(described below) that is not intuitive to me. Consider the example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> s_1 = pd.Series(np.random.randn(2,), index=['1981-12-10', '1984-09-14'])
>>> s_1.index = pd.to_datetime(s_1.index, utc=True)
>>> df_1 = pd.DataFrame(s_1, columns=['s_1']).assign(date=s_1.index)
>>> df_1.dtypes
s_1 float64
date datetime64[ns, UTC]
dtype: object
>>>
>>> d = {
... 'v': np.random.randn(2,),
... 'close': ['1981-12-10', '1984-09-14']
>>> }
>>> df_2 = pd.DataFrame(data=d)
>>> df_2.close = pd.to_datetime(df_2.close, utc=True)
>>> df_2['date'] = df_2.close.apply(lambda x: x.replace(hour=0, minute=0, second=0))
>>> df_2.dtypes
v float64
close datetime64[ns, UTC]
date datetime64[ns, UTC]
dtype: object
>>>
>>> df_1.join(df_2, on='date', lsuffix='_')
[...stacktrace ommitted for brevity...]
ValueError: You are trying to merge on datetime64[ns, UTC] and int64 columns. If you wish to proceed you should use pd.concat
Clearly the date
field is not an int64
. The documentation for join says "Index should be similar to one of the columns in this one." so I set the index for df_2
to the date
field and tried again:
>>> df_2.set_index('date', drop=False, inplace=True)
>>> df_1.dtypes
s_1 float64
date datetime64[ns, UTC]
dtype: object
>>> df_1.index
DatetimeIndex(['1981-12-10', '1984-09-14'], dtype='datetime64[ns, UTC]', freq=None)
>>>
>>> df_2.dtypes
v float64
close datetime64[ns, UTC]
date datetime64[ns, UTC]
dtype: object
>>> df_2.index
DatetimeIndex(['1981-12-10', '1984-09-14'], dtype='datetime64[ns, UTC]', name='date', freq=None)
>>>
>>> df_1.join(df_2, on='date', lsuffix='_')
[...stacktrace ommitted for brevity...]
ValueError: You are trying to merge on datetime64[ns, UTC] and datetime64[ns] columns. If you wish to proceed you should use pd.concat
Before you suggest I follow the friendly instructions and use pd.concat
, I cannot: this is not my code ;)