1

I'm trying to impute some missing df['Roll_time'] values I have in my dataset. I have the avg_time_diff variable that is a timedelta64[ns] dtype and the df['Notif_date'] that is a datetime.time. I want to impute the sum of the avg_time_diff and the 'Notif_date' for each row that is missing the 'Roll_time'.

So far I have this:

avg_time_diff = df['Time_diff'].mean()
df['Time_diff'].fillna(avg_time_diff, inplace=True)

df['Roll_time'].fillna(avg_time_diff + df['Notif_time'])

I get this error when I run the code:

TypeError: unsupported operand type(s) for +: 'Timedelta' and 'datetime.time'
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • you need to convert `Timedelta` to `datetime.time()` or vice versa for compatible datatypes – Epsi95 Feb 04 '21 at 07:30
  • @Epsi95: to me, the arithmetic only makes sense if datetime.time is converted to timedelta – FObersteiner Feb 04 '21 at 07:40
  • @MrFuppes that is true, I replied in a generic way like compatible data types, but I should be more careful. Thank you – Epsi95 Feb 04 '21 at 07:54
  • 1
    @Epsi95: no worries, this is specific to pandas and Python datetime anyway. All these classes handling date and time in different ways are pretty confusing I think (especially if you're new to Python/pandas). – FObersteiner Feb 04 '21 at 08:02

1 Answers1

0

You'll need to convert the datetime.time objects to timedelta as well so that the arithmetic works.

Ex:

import datetime
import pandas as pd

# some dummy data:
df = pd.DataFrame({'Time_diff': [pd.Timedelta(hours=1), pd.Timedelta(hours=2), pd.NaT, pd.Timedelta(hours=4)],
                   'Notif_time': [datetime.time(1,2,3), datetime.time(2,3,4), datetime.time(4,5,6), datetime.time(7,8,9)]})

# Time_diff column and avg_time_diff are of dtype Timedelta...
avg_time_diff = df['Time_diff'].mean() 
df['Time_diff'] = df['Time_diff'].fillna(avg_time_diff)

# need to cast Notif_time to Timedelta as well so that the arithmetic works out:
df['Roll_time'] = avg_time_diff + pd.to_timedelta(df['Notif_time'].astype(str))

# df['Roll_time']
# 0   0 days 03:22:03
# 1   0 days 04:23:04
# 2   0 days 06:25:06
# 3   0 days 09:28:09
# Name: Roll_time, dtype: timedelta64[ns]

If you want the output to be of dtype datetime (with all the formatting options etc.), you can get that by adding a date:

# to get from timedelta to datetime, you can add the timedelta column to today's date:
df['roll_datetime'] = pd.Timestamp('now').floor('d') + df['Roll_time']

# df['roll_datetime']
# 0   2021-02-04 03:22:03
# 1   2021-02-04 04:23:04
# 2   2021-02-04 06:25:06
# 3   2021-02-04 09:28:09
# Name: roll_datetime, dtype: datetime64[ns]

Further reading: Format timedelta to string

FObersteiner
  • 22,500
  • 8
  • 42
  • 72