I have a dataframe with different timestamp for each user, and I want to calculate the duration. I used this code to import my CSV files:
import pandas as pd
import glob
path = r'C:\Users\...\Desktop'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0,encoding='ISO-8859-1')
li.append(df)
df = pd.concat(li, axis=0, ignore_index=True)
df.head()
ID timestamp
1828765 31-05-2021 22:27:03
1828765 31-05-2021 22:27:12
1828765 31-05-2021 22:27:13
1828765 31-05-2021 22:27:34
2056557 21-07-2021 10:27:12
2056557 21-07-2021 10:27:20
2056557 21-07-2021 10:27:22
And I want to get something like that
ID timestamp duration(s)
1828765 31-05-2021 22:27:03 NAN
1828765 31-05-2021 22:27:12 9
1828765 31-05-2021 22:27:13 1
1828765 31-05-2021 22:27:34 21
2056557 21-07-2021 10:27:12 NAN
2056557 21-07-2021 10:27:20 8
2056557 21-07-2021 10:27:22 2
I've used this code, but doesn't work for me
import datetime
df['timestamp'] = pd.to_datetime(df['timestamp'], format = "%d-%m-%Y %H:%M:%S")
df['time_diff'] = 0
for i in range(df.shape[0] - 1):
df['time_diff'][i+1] = (datetime.datetime.min + (df['timestamp'][i+1] - df['timestamp'][i])).time()