5

I'm trying to load time series data from some files. The data has this format

04/02/2015 19:07:53.951,3195,1751,-44,-25

I'm using this code to load the whole file as a numpy object.

 content = np.loadtxt(filename, dtype={'names': ('timestamp', 'tick', 'ch', 'NodeI', 'Base'),
                                      'formats': ('datetime64[us]', 'i4', 'i4', 'i4', 'i4')}, delimiter=',', skiprows=27)

but i got an error with the datetime format

ValueError: Error parsing datetime string "04/02/2015 19:07:53.951" at position 2

there is an easy way to define the datetime format I'm reading? There files with a lot of data so I'm trying not to walk the file more than once.

Pablo V.
  • 324
  • 2
  • 16
  • 1
    My suggesion: Read the timestamp in a string data type and do the conversion afterwards for each entry. – jofel Jan 18 '16 at 17:34

2 Answers2

3

Use the converters argument in order to apply a converter function to the data on the first column:

import datetime

def parsetime(v): 
    return np.datetime64(
        datetime.datetime.strptime(v, '%d/%m/%Y %H:%M:%S.%f')
    )

content = np.loadtxt(
    filename, 
    dtype={
        'names': ('timestamp', 'tick', 'ch', 'NodeI', 'Base'),
        'formats': ('datetime64[us]', 'i4', 'i4', 'i4', 'i4')
    }, 
    delimiter=',', 
    skiprows=27,
    converters={0: parsetime},
)

I assume your data file is using D/M/Y, adjust the format string accordingly if you are using M/D/Y.

Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • I realize the data saved this way is an array, but every row is a "void" type, instead of ndarray, so I cant do data [:,2] for example to obtain a whole column – Pablo V. Jan 18 '16 at 19:48
  • Is that the syntax to get all values from the 3rd column? In order to get the whole column, for example, the 3 column (index 2) I would do `content['ch']`... Sorry, I'm a web developer, not a data scientist, I don't really use numpy. – Paulo Scardine Jan 19 '16 at 02:05
  • good point it could work that way (the other way works with the panda solution), so I'll test the efficiency of both – Pablo V. Jan 19 '16 at 17:11
2

I'd suggest the pandas library and read_csv, you can use parse_dates to select the column and set infer_datetime_format to convert it to datetime format:

import pandas as pd
a=pd.read_csv('nu.txt',parse_dates=[0],infer_datetime_format=True,sep=',',header=None)

a.iloc[:,0]



0   2015-04-02 19:07:53.951
1   2015-04-02 19:07:53.951
2   2015-04-02 19:07:53.951
3   2015-04-02 19:07:53.951
Name: 0, dtype: datetime64[ns]
# assumes file with four identical rows and no header

Also, it's easy to convert to numpy, if needed:

b=np.array(a)
array([[Timestamp('2015-04-02 19:07:53.951000'), 3195L, 1751L, -44L, -25L],
       [Timestamp('2015-04-02 19:07:53.951000'), 3195L, 1751L, -44L, -25L],
       [Timestamp('2015-04-02 19:07:53.951000'), 3195L, 1751L, -44L, -25L],
       [Timestamp('2015-04-02 19:07:53.951000'), 3195L, 1751L, -44L, -25L]], dtype=object)
Lee
  • 29,398
  • 28
  • 117
  • 170
  • it works pretty well even if it feels quite slow. I'll do some more test. Thanks anyway,. – Pablo V. Jan 18 '16 at 19:51
  • Doesn't `pd.read_csv` create Pandas datetime objects which have `ns` resolution? What if I want dates spanning to `9999-12-31`? – ifly6 Apr 26 '19 at 14:11