I am having some difficulty working with times/timezones. I have raw JSON data of the form
{
"Date": "28 Sep 2009 00:00:00",
....
}
This data is then loaded into MongoDB, and this string representation of the date is transformed into a JavaScript Date object. This conversion to UTC time results in the following date
{
"_id": ObjectId("577a788f4439e17afd4e21f7"),
"Date": ISODate("2009-09-27T23:00:00Z")
}
It "looks" as though the date has actually been moved forward a day, I'm assuming (perhaps incorrectly) that this is because my machine is set to Irish Standard Time.
I then read this data from MongoDB and use it to create a pandas DatetimeIndex
idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D')
which gives me
which is incorrect since the time has not been converted back correctly from UTC to local time. So I followed the solution given in this answer
idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D')
idx = idx.tz_localize(tz=tz.tzutc())
idx = idx.tz_convert(tz=tz.tzlocal())
frame = DataFrame(test_docs, index=idx)
frame = frame.drop('Date', 1)
which gives me the right day back
I then normalize the DatetimeIndex so the hours are removed, allowing me to group all entries by day.
frame.groupby(idx).sum()
At this point, however, something strange happens. The dates end up getting grouped as follows
but this doesn't reflect the dates in the frame
Can anyone shed some light on where I might be going wrong?
Response to @ptrj
Explicitly using my timezone as a string
idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D')
idx = idx.tz_localize(tz=tz.tzutc())
idx = idx.tz_convert(tz='Europe/Dublin')
idx = idx.normalize()
frame = DataFrame(test_docs, index=idx)
...
...
aggregate = frame.groupby(idx).sum()
aggregate.plot()
this doesn't work for me, it results in the following plot
For some reason the groupby is not properly grouping for 2014, as shown below
If instead, I use
idx = idx.tz_convert(tz.gettz('Europe/Dublin'))
I get the same problem
Convert to an object
idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D')
idx = idx.tz_localize(tz=tz.tzutc())
idx = idx.tz_convert(tz=tz.tzlocal())
idx = idx.normalize()
frame = DataFrame(test_docs, index=idx)
aggregate = frame.groupby(idx.astype(object)).sum()
This approach seems to work correctly for me