0

We have two columns in a data frame: start_time, end_time (both type object)

data = {
"passenger": [913383, 442365, 983560, 163350],
"start_time": [0:00:00, 0:01:17, 0:00:24, 0:00:26],
"end_time": [0:00:17, 0:01:32, 0:03:20, 0:01:38]

}

#load data into a DataFrame object:
df = pd.DataFrame(data)

We are looking to create a third column with the time difference (end_time-start_time) in hh:mm:ss format:

df[time_difference]=df[end_time]-df[start_time]

If we convert the object type to_datetime it gives the full date too which is not what we want.

Any assistance would be appreciated.

Jack Ryan
  • 11
  • 4
  • check this link https://stackoverflow.com/questions/22923775/calculate-time-difference-between-two-pandas-columns-in-hours-and-minutes – Rajkumar Hajgude Jan 04 '23 at 20:47

1 Answers1

0

I do not do much time maniupulation, but hope this helps. You may find a more performant way to do this action, but here's what I came up quick and dirty.

import pandas as pd

data = {
    "passenger": [913383, 442365, 983560, 163350],
    "start_time": ["0:00:00", "0:01:17", "0:00:24", "0:00:26"],
    "end_time": ["0:00:17", "0:01:32", "0:03:20", "0:01:38"],
}

# Create a dataframe
df = pd.DataFrame(data)

# Convert columns to integers by removing colon
str_cols = ["start_time", "end_time"]
df[str_cols] = df[str_cols].replace(":", "", regex=True).astype(int)

# Calculate the difference between start and end
df["difference"] = df["end_time"] - df["start_time"]

# Manipulate columns back to original H:M:S format
df["start_time"] = pd.to_datetime(df["start_time"], unit="s").dt.strftime("%H:%M:%S")
df["end_time"] = pd.to_datetime(df["end_time"], unit="s").dt.strftime("%H:%M:%S")
df["difference_formatted"] = pd.to_datetime(df["difference"], unit="s").dt.strftime(
    "%H:%M:%S"
)

# Remove uneeded int column
df = df.drop(columns=["difference"], axis=1)

# Print types
print(df.dtypes)
print(df)

Gives the following output:

enter image description here

swolfe2
  • 429
  • 1
  • 6
  • 24