Parsing DD MM YY HH MM SS columns from TXT file using Python's pandas

Question

Thank you all for your time in advance. I have a number of space delimited text files in the format;

    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

being in the format DD MM YY HH MM SS and my result value. I am trying to read the txt file using Python's pandas. I have tried doing quite a bit of research on this prior to posting this question so hope I am not covering trodden ground.

Based on trial and error and research I have come up with:

    import pandas as pd
    from cStringIO import StringIO
    def parse_all_fields(day_col, month_col, year_col, hour_col, minute_col,second_col):
    day_col = _maybe_cast(day_col)
    month_col = _maybe_cast(month_col)
    year_col = _maybe_cast(year_col)
    hour_col = _maybe_cast(hour_col)
    minute_col = _maybe_cast(minute_col)
    second_col = _maybe_cast(second_col)
    return lib.try_parse_datetime_components(day_col, month_col, year_col, hour_col, minute_col, second_col)
    ##Read the .txt file
    data1 = pd.read_table('0132_3.TXT', sep='\s+', names=['Day','Month','Year','Hour','Min','Sec','Value'])
    data1[:10]

    Out[21]: 

    Day,Month,Year,Hour, Min, Sec, Value
    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

    data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

    TypeError                                 Traceback (most recent call last)
    <ipython-input-22-8ee408dc19c3> in <module>()
    ----> 1 data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':   ['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

    TypeError: expected read buffer, DataFrame found

At this point I am stuck. Firstly the expected read buffer error confuses me. Do I need to do more pre-processing of the .txt file to get the dates into a readable format? Note - the parse_function of read_table does not work on its own on this date format.

I am a beginner - trying to learn. Sorry if the code is wrong/basic/confusing. Would be very appreciative if someone could help. Many thanks in advance.

Andy Hayden · Accepted Answer · 2013-06-25T16:04:39.473

I think it's going to be easier just to parse the dates them when reading the csv:

In [1]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]])

In [2]: df
Out[2]:
                    0      1
0 2013-04-29 00:00:00  7.667
1 2013-04-29 00:00:00  7.000
2 2013-04-29 00:00:00  7.000
3 2013-04-29 00:00:00  7.333
4 2013-04-29 00:00:00  7.000

Since you're using a unusual date format you need to specify a date parser too:

In [11]: def date_parser(ss):
             day, month, year, hour, min, sec = ss.split()
             return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))

In [12]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

In [13]: df
Out[13]:
                    0      1
0 2013-04-29 18:15:00  7.667
1 2013-04-29 18:30:00  7.000
2 2013-04-29 18:45:00  7.000
3 2013-04-29 19:00:00  7.333
4 2013-04-29 19:15:00  7.000

Andy, Thank you very much for this - I see what you have done - and it works perfectly. — mich_1706, Jun 26 '13 at 02:14

Parsing DD MM YY HH MM SS columns from TXT file using Python's pandas

1 Answers1

Linked

Related