I have a DataFrame with columns for YEAR
END_DAY
and END_MONTH
as integers and I need to set new columns for START_DAY
and START_MONTH
. The start dates should be one day later than the end dates in the previous row (the first row should have START_DAY
and START_MONTH
both equal to 1).
My problem is that I'm having trouble getting pandas to create a date column which I can then apply a timedelta to. Here's my offending line:
aws['END_DATE'] = pandas.to_datetime(aws['YEAR'],
aws['END_MONTH'],
aws['END_DAY'], format="%m,%d")
And the traceback:
File "C:\Python27\Lib\site-packages\pandas\tseries\tools.py", line 136, in to_datetime
values = _convert_listlike(arg.values, box=False)
File "C:\Python27\Lib\site-packages\pandas\tseries\tools.py", line 115, in _convert_listlike
result = tslib.array_strptime(arg, format, coerce=coerce)
File "tslib.pyx", line 1255, in pandas.tslib.array_strptime (pandas\tslib.c:20089)
TypeError: expected string or buffer
EDIT: The part of the problem above is solved by passing a string as in @Roman Pekar's answer (or the linked duplicate), but they don't address the other part of the question. I have a solution for that (below), but I would be interested in any better options.
aws['END_DATE'] = pandas.to_datetime((aws['YEAR'] * 10000 +
aws['END_MONTH'] * 100 +
aws['END_DAY']).astype(str),
format="%Y%m%d")
aws['START_DATE'] = aws['END_DATE'] + datetime.timedelta(days=1)
aws['START_DATE'] = aws['START_DATE'].shift(1)
aws['START_DATE'][0] = datetime.datetime(2000, 1, 1)
aws['START_DAY'] = aws['START_DATE'].apply(lambda x: x.day)
aws['START_MONTH'] = aws['START_DATE'].apply(lambda x: x.month)