Reading data from csv into pandas when date and time are in separate columns

Question

I looked at the answer to this question: Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python, but it doesn't seem to work for me, which makes me think I'm doing something subtley wrong.

I've got data in .csv files, which I'm trying to read using the pandas read_csv function. Date and time are in two separate columns, but I want to merge in them into one column, "Datetime", containing datetime objects. The csv looks like this:

    Note about the data
    blank line
    Site Id,Date,Time,WTEQ.I-1...
    2069, 2008-01-19, 06:00, -99.9...
    2069, 2008-01-19, 07:00, -99.9...
    ...

I'm trying to read it using this line of code:

   read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"])

However, when I write it back out to a csv, it looks exactly the same (except that the -99.9s are changed to NA, like I specified with the na_values argument). Date and time are in two separate columns. As I understand it, this should be creating a new column called Datetime that is composed of columns 1 and 2, parsed using the date_parser. I have also tried using parse_dates={"Datetime" : ["Date","Time"]}, parse_dates=[[1,2]], and parse_dates=[["Date", "Time"]]. I have also tried using date_parser=parse, where parse is defined as:

    parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M')

None of these has made the least bit of difference, which makes me suspect that there's some deeper problem. Any insight into what it might be?

It's working for me with your sample data. I had to take out the date_parser=True bit, though. — dooz, Jul 05 '13 at 16:12
Ooops, now I look at this, I think your ways should also work, which pandas version are you using? — Andy Hayden, Jul 05 '13 at 17:12
Glad I'm not crazy for thinking it seemed right :). I'm using pandas version 0.7.0-1 on Linux Mint 13 MATE edition. — seaotternerd, Jul 05 '13 at 17:29
@dooz - I tried taking that part out and it's still not working... Weird. It must be something specific to my set-up. — seaotternerd, Jul 05 '13 at 17:30
the latest stable version is 0.11, you should definitely update! (I edited my answer to reflect this.) — Andy Hayden, Jul 05 '13 at 20:57

score 4 · Accepted Answer · edited May 30 '14 at 14:21

You should update your pandas, I recommend the latest stable version for the latest features and bug fixes.

This specific feature was introduced in 0.8.0, and works on pandas version 0.11:

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"])
Out[11]:
             Datetime  Site Id  WTEQ.I-1
0 2008-01-19 06:00:00     2069       NaN
1 2008-01-19 07:00:00     2069       NaN

without the date_parser=True (since this should be a parsing function, see docstring).

Note that in the provided example the resulting "Datetime" column is a Series of its own and not the index values of the DataFrame. If you'd rather want to have the datetime values as index column rather than the integer value pass the index_col argument specifying the desired column, in this case 0 since the resulting "Datetime" column is the first one.

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"])

Reading data from csv into pandas when date and time are in separate columns

1 Answers1

Linked