0

I have a dataset like this:

data = pd.DataFrame({'order_date-time':['2017-09-13 08:59:02', '2017-06-28 11:52:20', '2018-05-18 10:25:53', '2017-08-01 18:38:42', '2017-08-10 21:48:40','2017-07-27 15:11:51',
                                   '2018-03-18 21:00:44','2017-08-05 16:59:05', '2017-08-05 16:59:05','2017-06-05 12:22:19'],
                'delivery_date_time':['2017-09-20 23:43:48', '2017-07-13 20:39:29','2018-06-04 18:34:26','2017-08-09 21:26:33','2017-08-24 20:04:21','2017-08-31 20:19:52',
                                      '2018-03-28 21:57:44','2017-08-14 18:13:03','2017-08-14 18:13:03','2017-06-26 13:52:03']})

I want to calculate the time differences between these dates as the number of days and add it to the table as the delivery delay column. But I need to include both day and time for this calculation for example, if the difference is 7 days 14:44:46 we can round this to 7 days.

nemo92world
  • 101
  • 8

2 Answers2

2
from datetime import datetime

datetime.strptime(date_string, format)

you could use this to convert the string to DateTime format and put it in variable and then calculate it

Visit https://www.journaldev.com/23365/python-string-to-datetime-strptime/

M.Mevlevi
  • 339
  • 3
  • 16
  • actually, I tried this code datetime.strptime('order_date-time', '%Y-%m-%d %H:%M:%S') but I got the error time data 'order_date-time' does not match format '%Y-%m-%d %H:%M:%S' – nemo92world Jan 30 '21 at 11:32
0

Python's datetime library is good to work with individual timestamps. If you have your data in a pandas DataFrame as in your case, however, you should use pandas datetime functionality.

To convert a column with timestamps from stings to proper datetime format you can use pandas.to_datetime():

data['order_date_time'] = pd.to_datetime(data['order_date_time'], format="%Y-%m-%d %H:%M:%S")
data['delivery_date_time'] = pd.to_datetime(data['delivery_date_time'], format="%Y-%m-%d %H:%M:%S")

The format argument is optional, but I think it is a good idea to always use it to make sure your datetime format is not "interpreted" incorrectly. It also makes the process much faster on large data-sets.

Once you have the columns in a datetime format you can simply calculate the timedelta between them:

data['delay'] = data['delivery_date_time'] - data['order_date_time']

And then finally, if you want to round this timedelta, then pandas has again the right method for this:

data['approx_delay'] = data['delay'].dt.round('d')

where the extra dt gives access to datetime specific methods, the round function takes a frequency as arguments, and in this case that frequency has been set to a day using 'd'

robbo
  • 525
  • 3
  • 11