0

I have a DataFrame with columns for YEAR END_DAY and END_MONTH as integers and I need to set new columns for START_DAY and START_MONTH. The start dates should be one day later than the end dates in the previous row (the first row should have START_DAY and START_MONTH both equal to 1).

My problem is that I'm having trouble getting pandas to create a date column which I can then apply a timedelta to. Here's my offending line:

aws['END_DATE'] = pandas.to_datetime(aws['YEAR'],
                                     aws['END_MONTH'],
                                     aws['END_DAY'], format="%m,%d")

And the traceback:

File "C:\Python27\Lib\site-packages\pandas\tseries\tools.py", line 136, in to_datetime
    values = _convert_listlike(arg.values, box=False)
File "C:\Python27\Lib\site-packages\pandas\tseries\tools.py", line 115, in _convert_listlike
    result = tslib.array_strptime(arg, format, coerce=coerce)
File "tslib.pyx", line 1255, in pandas.tslib.array_strptime (pandas\tslib.c:20089)
TypeError: expected string or buffer

EDIT: The part of the problem above is solved by passing a string as in @Roman Pekar's answer (or the linked duplicate), but they don't address the other part of the question. I have a solution for that (below), but I would be interested in any better options.

aws['END_DATE'] = pandas.to_datetime((aws['YEAR'] * 10000 +
                                      aws['END_MONTH'] * 100 +
                                      aws['END_DAY']).astype(str),
                                     format="%Y%m%d")
aws['START_DATE'] = aws['END_DATE'] + datetime.timedelta(days=1)
aws['START_DATE'] = aws['START_DATE'].shift(1)
aws['START_DATE'][0] = datetime.datetime(2000, 1, 1)
aws['START_DAY'] = aws['START_DATE'].apply(lambda x: x.day)
aws['START_MONTH'] = aws['START_DATE'].apply(lambda x: x.month)
Jamie Bull
  • 12,889
  • 15
  • 77
  • 116
  • possible duplicate of [How to convert columns into one datetime column in pandas?](http://stackoverflow.com/questions/19350806/how-to-convert-columns-into-one-datetime-column-in-pandas) – joris Dec 09 '14 at 12:54
  • That one solves the first part of the question, though doesn't solve the offset by a row part. – Jamie Bull Dec 09 '14 at 12:56
  • 1
    Can you then update your question using that info, and specify the second part? – joris Dec 09 '14 at 13:00

1 Answers1

0

as far as I know, to_datetime expects string and you have ints. You can create new column like:

aws['END_DATE']  = aws.apply(lambda x: datetime(x['YEAR'], x['END_MONTH'], x['END_DAY']), axis=1)
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197