1

After reading Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python and Using python pandas to parse CSV with date in format Year, Day, Hour, Min, Sec

I still am not able to parse dates with separated columns for year, month, day and hour. My data looks like this (zeroth column is ID, first is year, second is month, third is day, fourth is hour and fifth is value)

50136   2011    1   1   21  9792    
50136   2011    1   1   22  9794    
50136   2011    1   1   23  9796    
50136   2011    1   1   0   9798    
50136   2011    1   1   1   9799    
50136   2011    1   1   2   9802

I've tried following: df = pd.read_csv(file, parse_dates = {'date': [1, 2, 3, 4]}, , index_col='date'), but then I get index not as timestamp but as unicode(?)

In  [17]: print df.head()
Out [17]:
                 0     5
date                    
2011 1 1 21  50136  9792
2011 1 1 22  50136  9794
2011 1 1 23  50136  9796
2011 1 1 0   50136  9798
2011 1 1 1   50136  9799

In  [18]: print df.index
Out [18]:
Index([u'2011 1 1 21', u'2011 1 1 22', u'2011 1 1 23', u'2011 1 1 0', u'2011 1 1 1', u'2011 1 1 2'], dtype=object)

I'm obviously doing something wrong, but I can't figure it out. Any advise is really appreciated.

Community
  • 1
  • 1
Mattijn
  • 12,975
  • 15
  • 45
  • 68

1 Answers1

11

If the regular methods dont work you can always fallback on writing your own parser. Make a function which accepts the columns from parse_dates and returns a datetime and add that functions with date_parser.

So something like:

df = pd.read_csv(file, header=None, index_col='datetime', 
                 parse_dates={'datetime': [1,2,3,4]}, 
                 date_parser=lambda x: pd.datetime.strptime(x, '%Y %m %d %H'))

Returns:

                         0     5
datetime                        
2011-01-01 21:00:00  50136  9792
2011-01-01 22:00:00  50136  9794
2011-01-01 23:00:00  50136  9796
2011-01-01 00:00:00  50136  9798
2011-01-01 01:00:00  50136  9799
2011-01-01 02:00:00  50136  9802

edit:

Perhaps its more clear if you write it like a normal function instead of a lambda:

def dt_parse(date_string):

    dt = pd.datetime.strptime(date_string, '%Y %m %d %H')

    return dt
Rutger Kassies
  • 61,630
  • 17
  • 112
  • 97
  • Thanks! I definitely should look into the magic of the lambda function, because your parser works like a charm. – Mattijn Nov 25 '13 at 13:37
  • Its easier then it looks. Columns 1 till 4 basically get stitched together in a single string for each row, the lambda functions then maps that string to the format `'%Y %m %d %H'` which converts it into a proper datetime. – Rutger Kassies Nov 25 '13 at 14:37
  • 1
    `FutureWarning: The pandas.datetime class is deprecated and will be removed from pandas in a future version. Import from datetime module instead` – baxx Jul 18 '21 at 12:07