9

I looked at the answer to this question: Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python, but it doesn't seem to work for me, which makes me think I'm doing something subtley wrong.

I've got data in .csv files, which I'm trying to read using the pandas read_csv function. Date and time are in two separate columns, but I want to merge in them into one column, "Datetime", containing datetime objects. The csv looks like this:

    Note about the data
    blank line
    Site Id,Date,Time,WTEQ.I-1...
    2069, 2008-01-19, 06:00, -99.9...
    2069, 2008-01-19, 07:00, -99.9...
    ...

I'm trying to read it using this line of code:

   read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, date_parser=True, na_values=["-99.9"])

However, when I write it back out to a csv, it looks exactly the same (except that the -99.9s are changed to NA, like I specified with the na_values argument). Date and time are in two separate columns. As I understand it, this should be creating a new column called Datetime that is composed of columns 1 and 2, parsed using the date_parser. I have also tried using parse_dates={"Datetime" : ["Date","Time"]}, parse_dates=[[1,2]], and parse_dates=[["Date", "Time"]]. I have also tried using date_parser=parse, where parse is defined as:

    parse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M')

None of these has made the least bit of difference, which makes me suspect that there's some deeper problem. Any insight into what it might be?

Community
  • 1
  • 1
seaotternerd
  • 6,298
  • 2
  • 47
  • 58

1 Answers1

4

You should update your pandas, I recommend the latest stable version for the latest features and bug fixes.

This specific feature was introduced in 0.8.0, and works on pandas version 0.11:

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, na_values=["-99.9"])
Out[11]:
             Datetime  Site Id  WTEQ.I-1
0 2008-01-19 06:00:00     2069       NaN
1 2008-01-19 07:00:00     2069       NaN

without the date_parser=True (since this should be a parsing function, see docstring).

Note that in the provided example the resulting "Datetime" column is a Series of its own and not the index values of the DataFrame. If you'd rather want to have the datetime values as index column rather than the integer value pass the index_col argument specifying the desired column, in this case 0 since the resulting "Datetime" column is the first one.

In [11]: read_csv("2069_ALL_YEAR=2008.csv", skiprows=2, parse_dates={"Datetime" : [1,2]}, index_col=0, na_values=["-99.9"])
anddam
  • 1,322
  • 1
  • 12
  • 12
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535