2

I have recently faced a similar problem (answered here) whereby conversion of a date to a pandas DatetimeIndex and subsequent groupby using those dates led to an error where the date appeared as 1970-01-01 00:00:00+00:00.

I'm facing this problem in a different context now, and the previous solution isn't helping me.

I have a frame like this

import pandas as pd
from dateutil import tz

data = { 'Events' : range(1, 5 + 1 ,1), 'ID' : [1, 1, 1, 1, 1]}
idx = pd.date_range(start='2008-01-01', end='2008-01-05', freq='D', tz=tz.tzlocal())
frame = pd.DataFrame(data, index=idx)



                           Events  ID
2008-01-01 00:00:00+00:00       1   1
2008-01-02 00:00:00+00:00       2   1
2008-01-03 00:00:00+00:00       3   1
2008-01-04 00:00:00+00:00       4   1
2008-01-05 00:00:00+00:00       5   1

and I want to change the index from just the date, to a MultiIndex of [date, ID], but in doing so that "1970 bug" appears

frame.set_index([frame.ID, frame.index])

                              Events  ID
ID                                      
1  2008-01-01 00:00:00+00:00       1   1
   1970-01-01 00:00:00+00:00       2   1
   1970-01-01 00:00:00+00:00       3   1
   1970-01-01 00:00:00+00:00       4   1
   1970-01-01 00:00:00+00:00       5   1

Versions

  • Python 2.7.11
  • Pandas 0.18.0
Community
  • 1
  • 1
Philip O'Brien
  • 4,146
  • 10
  • 46
  • 96
  • 4
    Please post your dataframes and code as text, not as images. – Alicia Garcia-Raboso Jul 20 '16 at 14:40
  • I don't know how to cleanly copy/paste the DataFrame. Can I ask what the difference is? I could print the DataFrame in the notebook, and then copy/paste that output, but it would look much uglier surely? – Philip O'Brien Jul 20 '16 at 15:23
  • 2
    If you post images, anybody trying to help you would have to manually type everything. Please take the time to do it yourself. – Alicia Garcia-Raboso Jul 20 '16 at 15:28
  • But I don't understand what I could provide in my question that would need to be manually typed? I can't provide the code because it is read from a Mongo collection (and is sensitive) so I was using the images to illustrate the structure of the DataFrame. I guess I could write the lines `frame` and `frame.set_index([frame.ID, frame.index])` as text, but the DataFrame image itself is surely more readable in it's current form? – Philip O'Brien Jul 20 '16 at 15:31
  • 3
    In order to help you, people need to be able to reproduce your problem. You should make that easy by including sample code and data that people can copy and paste and run themselves, and it will demonstrate the problem. – BrenBarn Jul 20 '16 at 15:32
  • 1
    OK I get you, instead of an image of the sensitive data, I should create dummy data of the same form, and include that code. I'll do it now. Thanks for explaining – Philip O'Brien Jul 20 '16 at 15:33
  • 1
    question updated accordingly – Philip O'Brien Jul 20 '16 at 15:52
  • What version of pandas is this? – Alex Jul 20 '16 at 16:04
  • 3
    How are you importing tz? – Merlin Jul 20 '16 at 16:04
  • I recall my days working with mongodb to python...something about converting the javascript datetime to unix time before bringing it into python.. Try that. – Merlin Jul 20 '16 at 16:26
  • I'am unable to reproduce your problem everything works fine and dates are correct. – shivsn Jul 20 '16 at 16:41
  • @Merlin using `from dateutil import tz`. – shivsn Jul 20 '16 at 16:58
  • @Merlin but it works for me I'am using `python-2.7.10,pandas-0.16.2`. – shivsn Jul 20 '16 at 17:06

2 Answers2

1

The accepted answer of your other question works for me (Python 3.5.2, Pandas 0.18.1):

print(frame.set_index([frame.ID, frame.index]))

#                               Events  ID
# ID                                      
# 1  2008-01-01 00:00:00-05:00       1   1
#    1970-01-01 00:00:00-05:00       2   1
#    1970-01-01 00:00:00-05:00       3   1
#    1970-01-01 00:00:00-05:00       4   1
#    1970-01-01 00:00:00-05:00       5   1

frame.index = frame.index.tz_convert(tz='EST')
print(frame.set_index([frame.ID, frame.index]))

#                               Events  ID
# ID                                      
# 1  2008-01-01 00:00:00-05:00       1   1
#    2008-01-02 00:00:00-05:00       2   1
#    2008-01-03 00:00:00-05:00       3   1
#    2008-01-04 00:00:00-05:00       4   1
#    2008-01-05 00:00:00-05:00       5   1

(My local time is different from yours.)

Alicia Garcia-Raboso
  • 13,193
  • 1
  • 43
  • 48
1
frame = frame.reset_index()
frame = frame.set_index([frame.ID, frame.index])
print frame

                         index  Events  ID
ID                                        
1  0 2008-01-01 00:00:00-05:00       1   1
   1 2008-01-02 00:00:00-05:00       2   1
   2 2008-01-03 00:00:00-05:00       3   1
   3 2008-01-04 00:00:00-05:00       4   1
   4 2008-01-05 00:00:00-05:00       5   1


print frame.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 5 entries, (1, 0) to (1, 4)
Data columns (total 4 columns):
level_0    5 non-null int64
index      5 non-null datetime64[ns, tzlocal()]
Events     5 non-null int64
ID         5 non-null int64
dtypes: datetime64[ns, tzlocal()](1), int64(3)
memory usage: 200.0+ bytes
Merlin
  • 24,552
  • 41
  • 131
  • 206
  • This works for me too, and seems to me a better workaround --- no need to fiddle with timezones --- until pandas 0.19 comes out. @Philip O'Brien: if it works for you too, I think you should accept this answer instead of mine. – Alicia Garcia-Raboso Jul 20 '16 at 20:40