Min, Max, Mean Duration Time csv

Question

I'm trying to subtract two columns in a CSV to create a 3rd column "Duration" End-Time - Start_time

Each row corresponds to a User Id as well.

I can create a csv file with just the Duration column but i rather redirect it back to the original csv.

The format of these times for e.g is like 2016-11-12 01:25:24+00 - 2016-11-12 01:25:20+00

So far I have done this

start_stop_sessions = pd.read_csv("start_stop_sessions.csv", parse_dates
['time_x', 'time_y'])

start_stop_sessions['time_delta'] = start_stop_sessions.time_y.values -
start_stop_sessions.time_x.values

Duration = (start_stop_sessions.time_delta)
print (Duration)
sys.stdout = open('Duration.csv', 'w')

Durationlist = ("Duration.csv") 
max_value = max(Durationlist)
min_value = min(Durationlist)

Am I doing this right?

TEST DATA

time_x, anonymous_id, time_y

2016-11-20 18:35:57+00, 1, 2016-11-20 19:03:31+00

2016-11-21 19:33:06+, 2, 2016-11-21 19:45:47+00

2016-11-21 19:22:52+00, 3, 2016-11-21 19:26:02+00

1) I would need to create a 4th column Duration

2) List of MIN, MAX, AVG for this duration column

Have you checked [this](http://stackoverflow.com/questions/3146571/read-and-write-on-same-csv-file)? — Sangbok Lee, Mar 27 '17 at 04:52

jezrael · Answer 1 · 2017-03-27T05:47:58.957

2

I think you need to_csv for write file to csv:

df = pd.read_csv("start_stop_sessions.csv", parse_dates=['time_x','time_y'])

df['Duration'] = df['time_y'] - df['time_x']
#same as
#df['Duration'] = df['time_y'].sub(df['time_x'])
print (df)
               time_x  anonymous_id              time_y  Duration
0 2016-11-20 18:35:57             1 2016-11-20 19:03:31  00:27:34
1 2016-11-21 19:33:06             2 2016-11-21 19:45:47  00:12:41
2 2016-11-21 19:22:52             3 2016-11-21 19:26:02  00:03:10

df.to_csv('start_stop_sessions.csv', index=False)

Then get min, max and mean of column Duration - output is timedelta:

print (df['Duration'].min())
0 days 00:03:10

print (df['Duration'].max())
0 days 00:27:34

print (df['Duration'].mean())
0 days 00:14:28.333333

Also if need convert timedelta to seconds need total_seconds:

df['Duration'] = (df['time_y'] - df['time_x']).dt.total_seconds()
print (df)
               time_x  anonymous_id              time_y  Duration
0 2016-11-20 18:35:57             1 2016-11-20 19:03:31    1654.0
1 2016-11-21 19:33:06             2 2016-11-21 19:45:47     761.0
2 2016-11-21 19:22:52             3 2016-11-21 19:26:02     190.0

df.to_csv('start_stop_sessions.csv', index=False)

print (df['Duration'].min())
190.0
print (df['Duration'].max())
1654.0
print (df['Duration'].mean())
868.3333333333334

edited Mar 27 '17 at 05:47

answered Mar 27 '17 at 05:09

jezrael

822,522
95
1,334
1,252

The output I get for subtracting the first row is like 890000000000, what about using time_delta? – Silas Mar 27 '17 at 05:17
What is your pandas version? `print (pd.show_versions())` – jezrael Mar 27 '17 at 05:18
1

I think there is problem. Last version is 0.19.2. Is possible upgrade? – jezrael Mar 27 '17 at 05:29
It works but is there a way to cap off the end after seconds 0 days 00:14:50.000000000 – Silas Mar 27 '17 at 05:44
I add solution if need output in seconds. – jezrael Mar 27 '17 at 05:48
How do I remove the trailing 000's after seconds? 00:14:50.000000000 – Silas Mar 27 '17 at 07:37
You can try rounding - `df['Duration'] = ((df['time_y'] - df['time_x'])).dt.round('S')`, but untested, because i have no trailing `0` – jezrael Mar 27 '17 at 07:44
Hmm, it doesn't change anything, would datetime.timedelta work? – Silas Mar 27 '17 at 07:53
It seems there are some `nanoseconds`, what return `df['Duration'] = ((df['time_y'] - df['time_x'])).astype(str)` ? All values have trailing `0` ? – jezrael Mar 27 '17 at 08:00
And then you can split by `.` and remove trailing `0` - `df['Duration'] = ((df['time_y'] - df['time_x'])).astype(str).str.split('.').str[0]` - output is not timedelta, but string. – jezrael Mar 27 '17 at 08:06
That solves the trailing 00's but then affects the max, mean and avg. Since it expects a numeric – Silas Mar 27 '17 at 08:08
You are right, is necessary convert to timedelta - I hope no trailing `0` - `df['Duration'] = pd.to_timedelta(((df['time_y'] - df['time_x'])).astype(str).str.split('.').str[0])` – jezrael Mar 27 '17 at 08:10

Min, Max, Mean Duration Time csv

1 Answers1