Reading a csv with a timestamp column, with pandas

Question

When doing:

import pandas
x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime', 
                                names=['DateTime', 'X'], header=None, sep=';')

with this data.csv file:

1449054136.83;15.31
1449054137.43;16.19
1449054138.04;19.22
1449054138.65;15.12
1449054139.25;13.12

(the 1st colum is a UNIX timestamp, i.e. seconds elapsed since 1/1/1970), I get this error when resampling the data every 15 second with x.resample('15S'):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex

It's like the "datetime" information has not been parsed:

                 X
DateTime      
1.449054e+09  15.31                
1.449054e+09  16.19
...

How to import a .CSV with date stored as timestamp with pandas module?

Then once I will be able to import the CSV, how to access to the lines for which date > 2015-12-02 12:02:18 ?

I think this question is a duplicate of http://stackoverflow.com/questions/12251483/idiomatic-way-to-parse-posix-timestamps-in-pandas . — Wirawan Purwanto, Jul 01 '16 at 15:10

score 43 · Answer 1 · answered Dec 06 '15 at 21:01

Use to_datetime and pass unit='s' to parse the units as unix timestamps, this will be much faster:

In [7]:
pd.to_datetime(df.index, unit='s')

Out[7]:
DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000',
               '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000',
               '2015-12-02 11:02:19.250000'],
              dtype='datetime64[ns]', name=0, freq=None)

Timings:

In [9]:

import time
%%timeit
import time
def date_parser(string_list):
    return [time.ctime(float(x)) for x in string_list]

df = pd.read_csv(io.StringIO(t), parse_dates=[0],  sep=';', 
                 date_parser=date_parser, 
                 index_col='DateTime', 
                 names=['DateTime', 'X'], header=None)
100 loops, best of 3: 4.07 ms per loop

and

In [12]:
%%timeit
t="""1449054136.83;15.31
1449054137.43;16.19
1449054138.04;19.22
1449054138.65;15.12
1449054139.25;13.12"""
df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0])
df.index = pd.to_datetime(df.index, unit='s')
100 loops, best of 3: 1.69 ms per loop

So using to_datetime is over 2x faster on this small dataset, I expect this to scale much better than the other methods

I'm not sure why, but with unit='s' pandas loses microsecond precision (pandas 0.18.1). Passing `df.ts*1000, unit='ms'` helps. — Mikhail Korobov, Jun 30 '16 at 18:44
@MikhailKorobov you'll have to post raw and code that demonstrates this as I can't comment otherwise — EdChum, Jun 30 '16 at 19:28

Budo Zindovic · Accepted Answer · 2015-12-06T23:44:17.630

33

My solution was similar to Mike's:

import pandas
import datetime
def dateparse (time_in_secs):    
    return datetime.datetime.fromtimestamp(float(time_in_secs))

x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None)

out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18))

edited Dec 06 '15 at 23:44

answered Dec 06 '15 at 20:52

Budo Zindovic

805
5
11

Thanks a lot! Do you have an example of how to access to the lines of `x` for which date is > 2015-12-02 12:02:18 ? (i.e. filter by date) – Basj Dec 06 '15 at 21:04
The solution is rather easy with Pandas. I've edited the solution. – Budo Zindovic Dec 06 '15 at 23:43
do you know why I'm unable to have it as suggested [here](http://stackoverflow.com/a/22898920/1422096)? I should be able to do `x.ix['2015-12-02 12:02:18':'2015-12-31 23:59:59']` or `x.loc[...]`, why doesn't it work as suggested there? Is it because the datetime column is not the index? How to make it "the index" then? – Basj Dec 07 '15 at 00:07
I'm referencing this as I found it the most helpful resource in explaining the problem and solution: http://stackoverflow.com/questions/21269399/datetime-dtypes-in-pandas-read-csv – Reddspark May 11 '17 at 13:27
EdChum's answer below is over 3x faster on large CSV files. – Sabar Feb 13 '20 at 22:28

score 5 · Answer 3 · answered Dec 06 '15 at 20:50

You can parse the date yourself:

import time
import pandas as pd

def date_parser(string_list):
    return [time.ctime(float(x)) for x in string_list]

df = pd.read_csv('data.csv', parse_dates=[0],  sep=';', 
                 date_parser=date_parser, 
                 index_col='DateTime', 
                 names=['DateTime', 'X'], header=None)

The result:

>>> df
                        X
DateTime                  
2015-12-02 12:02:16  15.31
2015-12-02 12:02:17  16.19
2015-12-02 12:02:18  19.22
2015-12-02 12:02:18  15.12
2015-12-02 12:02:19  13.12

Thanks a lot! Then (for the 2nd part of question), how to access to the subpart of `df` for which date is > 2015-12-02 12:02:18 ? (i.e. to filter) — Basj, Dec 06 '15 at 20:55

score 5 · Answer 4 · edited Nov 19 '21 at 11:56

5

The simplest way to do this:

df = pd.read_csv(f, parse_dates=['datecolumn', 'datecolumn1'], infer_datetime_format=True)

edited Nov 19 '21 at 11:56

RiveN

2,595
11
13
26

answered Nov 19 '21 at 10:17

Vetri

51
1
1

Thanks Vetri and @RiveN, but I can't apply this to the sample data given in my question. I tried all possible variants: `df = pd.read_csv(f, parse_dates=['dt'], names=['dt', 'X'], infer_datetime_format=True, sep=';', header=None)` but it does not work. Can you please add the code to be used for a CSV file with no header containing the lines in the question? Thanks! – Basj Nov 19 '21 at 12:39

score 0 · Answer 5 · answered Feb 08 '23 at 14:26

A slickened one-line version of @EdChum's solution worked for my dataset:

x = pd.read_csv('data.csv', 
                 parse_dates=True, 
                 date_parser=pd.to_datetime,
                 index_col='DateTime', 
                 names=['DateTime', 'X'], 
                 header=None, 
                 sep=';')

Reading a csv with a timestamp column, with pandas

5 Answers5

Linked