I have a list
of data that has been read from MongoDB. A subset of the data can be found in this gist. I am creating a DataFrame from this list, using the Date fields to create a DatetimeIndex. The dates were recorded originally in my local timezone, but in Mongo they have no timezone information attached, so I correct for DST as advised here.
from datetime import datetime
from dateutil import tz
# data is the list from the gist
dates = [x['Date'] for x in data]
idx = pd.DatetimeIndex(dates, freq='D')
idx = idx.tz_localize(tz=tz.tzutc())
idx = idx.tz_convert(tz='Europe/Dublin')
idx = idx.normalize()
frame = DataFrame(data, index=idx)
frame = frame.drop('Date', 1)
everything seems to work fine, and my frame looks like this
Events ID
2008-03-31 00:00:00+01:00 0.0 116927302
2008-03-30 00:00:00+00:00 2401.0 116927302
2008-03-31 00:00:00+01:00 0.0 116927307
2008-03-30 00:00:00+00:00 0.0 116927307
2008-03-31 00:00:00+01:00 0.0 121126919
2008-03-30 00:00:00+00:00 1019.0 121126919
2008-03-30 00:00:00+00:00 0.0 121126922
2008-03-31 00:00:00+01:00 0.0 121126922
2008-03-30 00:00:00+00:00 0.0 121127133
2008-03-31 00:00:00+01:00 0.0 121127133
2008-03-31 00:00:00+01:00 0.0 131677370
2008-03-30 00:00:00+00:00 0.0 131677370
2008-03-30 00:00:00+00:00 0.0 131677416
2008-03-31 00:00:00+01:00 0.0 131677416
Now I want to use both the original DatetimeIndex and the ID column to create a MultiIndex as shown here. When I try this, however, I get an error that wasn't raised when originally creating the DatetimeIndex
frame.set_index([frame.ID, idx])
NonExistentTimeError: 2008-03-30 01:00:00
If I just do frame.set_index(idx)
without the MultiIndex, it raises no error
Versions
- Python 2.7.11
- Pandas 0.18.0