0

I have a dataset like this

data = pd.DataFrame({'order_date-time':['2017-09-13 08:59:02', '2017-06-28 11:52:20', '2018-05-18 10:25:53', '2017-08-01 18:38:42', '2017-08-10 21:48:40','2017-07-27 15:11:51',
                                   '2018-03-18 21:00:44','2017-08-05 16:59:05', '2017-08-05 16:59:05','2017-06-05 12:22:19'],
                'delivery_date_time':['2017-09-20 23:43:48', '2017-07-13 20:39:29','2018-06-04 18:34:26','2017-08-09 21:26:33','2017-08-24 20:04:21','2017-08-31 20:19:52',
                                      '2018-03-28 21:57:44','2017-08-14 18:13:03','2017-08-14 18:13:03','2017-06-26 13:52:03']})

I need to calculate the delivery delay for this data I did this to change it to a dattime data

data['order_date-time']=pd.to_datetime(data['order_date-time'])
data['delivery_date_time']=pd.to_datetime(data['delivery_date_time'])

then I calculated the

data['delivery delay']= data['delivery_date_time']-data['order_date-time']

and new column looks like this in the output

7 days 14:44:46
15 days 08:47:09
...

how can I change these column values to int values like 7, 15, .. without "days" and time?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
nemo92world
  • 101
  • 8

1 Answers1

1

Subtracting two datetime columns from each other gives you a column of dtype timedelta. You can call the days attribute of a timedelta column with the dt accessor:

data['delivery delay'].dt.days

0     7
1    15
2    17
3     8
...

...or if you need fractional days, call the total_seconds and divide by the seconds in a day:

data['delivery delay'].dt.total_seconds()/86400

0     7.614421
1    15.366076
2    17.339271
3     8.116563
...

Unfortunately, you can't format timedelta to string as you can with datetime, see also Format timedelta to string.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72