How to get elapsed time on chunks csv python

Question

I'm reading a csv file. Each row has different values, but I'm only interested in first and second values, which have the following format:

2015-11-02 10:07:33,2015-11-02 10:07:52

I need to get the elapsed time between both. My code is:

file = pd.read_csv('file.csv', header=None, skiprows=1, index_col=False,
               chunksize=1000000, usecols=[1, 2], names=['ts', 'te'], na_values=['n/a','N/A','nan','NaN'],
              dtype={'ts':datetime, 'te':datetime})

for chunk in file:
    chunk['duration'] = chunk['te']-chunk['ts']

But I get the following error:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

What can I do? Thank you very much.

Update: My problem is solved. But now I need to cast every time elapsed chunk into a float. Thank you.

Since Python uses 0-based indexing, to read the first and second columns, use `usecols=[0, 1]`. — unutbu, Jun 13 '16 at 10:23
Please don't edit questions to a new question, esp. do not extend an answered question in to a new one. — Ilja Everilä, Jun 13 '16 at 11:09

luka5z · Answer 1 · 2016-06-13T11:06:32.740

Parse date & time columns using pandas module:

from pprint import pprint
import pandas as pd

file = pd.read_csv('file.csv', header=None, skiprows=1, index_col=False,
                   chunksize=1000000,
                   na_values=['n/a','N/A','nan','NaN'],
                   usecols=[0, 1], names=['ts', 'te'],
                   parse_dates=['ts', 'te'])

for chunk in file:
    chunk['duration'] = (chunk['te'] - chunk['ts']) / pd.np.timedelta64(1, 's') # converts duration to seconds
    pprint(chunk)

Example

file.csv file:

start-date,end-date,text
2015-11-02 10:07:33,2015-11-02 10:07:52,foo
2015-11-02 11:07:33,2015-11-02 11:08:52,bar

outputs:

                   ts                  te  duration
0 2015-11-02 10:07:33 2015-11-02 10:07:52      19.0
1 2015-11-02 11:07:33 2015-11-02 11:08:52      79.0

First of all thanks for your answer. It works for me. But now I have a second problem, how can I cast the elapsed time (it's a timedelta64[ns]) to a float? Thank you. — elena.bdc, Jun 13 '16 at 10:59

How to get elapsed time on chunks csv python

1 Answers1

Example