My sincere apologies for asking a question which has already been comprehensively resolved more than once. My problem is that I have tried to copy-paste, as is, examples from SO posts but noe work for me. I have data in the form "dd mm YYYY HH mm ss value value". I want to read this as a pandas dataframe with the first column as a datestring. Here are the examples I tried from SO posts: 1. first sample data
01 05 2013 00 00 00 26.4 16.5
02 05 2013 00 00 00 25.9 17.7
03 05 2013 00 00 00 26.6 17.4
......
The zeroes for HH mm ss were added just to conform exactly to the examples.
Secondly the packages imported
from datetime import datetime import pandas as pd import os from cStringIO import StringIO
fname = os.path.expanduser('./temps0.txt')
Now the three examples I tried
a) from SO post:
Parse dates when year month day and hour are in separate columns using pandas in python
def dt_parse(date_string): dt = pd.datetime.strptime(date_string, '%d %m %Y') return dt df = pd.read_csv(fname, header=None, index_col='datetime', parse_dates={'datetime': [0,1,2,4,5,6]}, date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y')) date_parser=dt_pasre)
and here is part of the error
File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
line 1763, in _try_convert_dates
colnames.append(str(columns[c]))
IndexError: list index out of range
b) Another try, from SO post:
https://stackoverflow.com/questions/11615504/parse-dates-when-yyyymmdd-and-hh-are-in-separate-columns-using-pandas-in-python
parse = lambda x: datetime.strptime(x, '%d%m%Y')
pd.read_table("./temps0.txt", parse_dates = [['DD MM YYYY HH mm ss']],
index_col = 0,
date_parser=parse)
and here is part of the error
File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
line 1009, in _set
self._reader.set_noconvert(names.index(x))
ValueError: 'DD MM YYYY HH mm ss' is not in list
c) Another try, from SO post:
https://stackoverflow.com/questions/17301589/parsing-dd-mm-yy-hh-mm-ss-columns-from-txt-file-using-pythons-pandas?lq=1
def date_parser(ss):
day, month, year, hour, min, sec = ss.split()
return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))
df = pd.read_csv('temps0.txt', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)
and here is part of the error
File "<stdin>", line 2, in date_parser
ValueError: too many values to unpack
Please excuse my ignorance as am trying to learn mostly through SO posts (the official documentation lacking completely in examples for clarity).