0

I've been working on this data set and I want to create a column for the difference between 2 dates.

df["ended_at"] = pd.to_datetime(df["ended_at"])
df["started_at"] = pd.to_datetime(df["started_at"])
df.insert(4,"trip_duration",df["ended_at"] - df["started_at"])

Results in this

I tried to get rid of the "0 days" using :

df["trip_duration"] = pd.to_datetime(df["trip_duration"], format = '%H:%M:%S').dt.time

But it results in the following TypeError :
TypeError: dtype timedelta64[ns] cannot be converted to datetime64[ns]

Can you help me get rid of the "0 days" part and only keep HH:MM:SS

Rosalyo
  • 37
  • 5

1 Answers1

0

Here is one way to do it:

import pandas as pd
df = pd.DataFrame({
    "started_at" : ['2021-04-01 00:03:18', '2021-04-01 00:03:47'],
    "ended_at" : ['2021-04-01 00:30:59', '2021-04-01 00:23:26']
})
df["ended_at"] = pd.to_datetime(df["ended_at"])
df["started_at"] = pd.to_datetime(df["started_at"])
df["trip_duration"] = (df["ended_at"] - df["started_at"]).astype(str).str.extract('.*?days.*?([0-9].*)')
print(df)

... gives:

           started_at            ended_at trip_duration
0 2021-04-01 00:03:18 2021-04-01 00:30:59      00:27:41
1 2021-04-01 00:03:47 2021-04-01 00:23:26      00:19:39
constantstranger
  • 9,176
  • 2
  • 5
  • 19