1

I have a set of data that looks like this (3 columns). The date and time are in 1 column and the timezone is in another column.

location,time,zone
EASTERN HILLSBOROUGH,1/27/2015 12:00,EST-5
EASTERN HILLSBOROUGH,1/24/2015 7:00,EST-5
EASTERN HILLSBOROUGH,1/27/2015 6:00,EST-5
EASTERN HILLSBOROUGH,2/14/2015 8:00,EST-5
EASTERN HILLSBOROUGH,2/7/2015 22:00,EST-5
EASTERN HILLSBOROUGH,2/2/2015 2:00,EST-5

I'm using pandas in order to parse the date and time with its respective timezone. In read_csv I can do parse_dates = [[1,2]] which, according to the docs, combines the columns into 1 and parses them.

So now the new data looks like this (2 columns)

location,time_zone
EASTERN HILLSBOROUGH,1/27/2015 12:00 EST-5
EASTERN HILLSBOROUGH,1/24/2015 7:00 EST-5
EASTERN HILLSBOROUGH,1/27/2015 6:00 EST-5
EASTERN HILLSBOROUGH,2/14/2015 8:00 EST-5
EASTERN HILLSBOROUGH,2/7/2015 22:00 EST-5
EASTERN HILLSBOROUGH,2/2/2015 2:00 EST-5

However, if I type df['time_zone'].dtype I get dtype('O') which isn't a datetimelike because I can't use the dt accessor with it.

How else can I parse those two columns properly?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Leb
  • 15,483
  • 10
  • 56
  • 75
  • Does this help, it looks like he had the same `dtype('O')` problem. `df['time_zone'] = pd.to_datetime(df['time_zone'])` http://stackoverflow.com/a/24446716/5889975 – steven Feb 23 '16 at 03:35
  • @steven that gives me `ValueError: Unknown string format`. pandas isn't liking the format for some reason, possible bug? `datetime` doesn't like it either, but `dateutil` works fine. I tried using that in a mapping function, but gives me same `ValueError` as above. – Leb Feb 23 '16 at 03:55
  • What result do you get from this: `type(df.time_zone.iat[0])`? That looks at a specific value rather than the entire column type. – Alexander Feb 23 '16 at 04:12
  • @Alexander it's a `str` – Leb Feb 23 '16 at 04:14
  • What are the values of df.zone.unique()? – Alexander Feb 23 '16 at 04:21
  • They are: `array(['EST-5', 'CST-6', 'MST-7', 'GST10', 'PST-8', 'AKST-9', 'HST-10', 'AST-4', 'SST-11'], dtype=object)` – Leb Feb 23 '16 at 04:22

2 Answers2

0

Not sure if this is what you want, but you could just read in (without any datetime parsing) and then use to_datetime (note that new variable time_zone is 5 hours later than time).

df['time_zone'] = pd.to_datetime( df.time + df.zone )

               location             time   zone           time_zone
0  EASTERN HILLSBOROUGH  1/27/2015 12:00  EST-5 2015-01-27 17:00:00
1  EASTERN HILLSBOROUGH   1/24/2015 7:00  EST-5 2015-01-24 12:00:00
2  EASTERN HILLSBOROUGH   1/27/2015 6:00  EST-5 2015-01-27 11:00:00
3  EASTERN HILLSBOROUGH   2/14/2015 8:00  EST-5 2015-02-14 13:00:00
4  EASTERN HILLSBOROUGH   2/7/2015 22:00  EST-5 2015-02-08 03:00:00
5  EASTERN HILLSBOROUGH    2/2/2015 2:00  EST-5 2015-02-02 07:00:00

df.info()

location     6 non-null object
time         6 non-null object
zone         6 non-null object
time_zone    6 non-null datetime64[ns]
JohnE
  • 29,156
  • 8
  • 79
  • 109
  • I've tried that, but when I do it I still get `ValueError: Unknown string format` – Leb Feb 23 '16 at 04:17
  • I dunno, I mean, I just ran that on the data you provided and it work OK for me. pandas 0.17.1 here. With the code I gave, you could check whether the problem is in the time or zone part. i.e. try `'1/1/2015 12:00' + df.zone` and `df.time + 'EST-5'` and see which part generates the error – JohnE Feb 23 '16 at 04:20
  • Same pandas version, what about python and dateutil? If you don't mind. – Leb Feb 23 '16 at 04:21
  • I'm at python 3.5.1. Not sure on dateutil but I updated anaconda very recently so probably very recent version. – JohnE Feb 23 '16 at 04:25
  • Ok it works now. Not sure why this is the case as there are no problems with the data, but adding `errors = 'coerce'` runs fine. I checked if any of my rows got `NaT` output but there was none. – Leb Feb 23 '16 at 04:31
0

Per the pytz module:

The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans.

I don't believe your timezones are standard, which makes the conversion a little more tricky. We should, however, be able to strip the timezone offset and add it to the UTC time using datetime.timedelta. This is a hack, and I wish I knew a better way.

I assume all times are recorded in their local timezones, so 1/27/2015 12:00 EST-5 would be 1/27/2015 17:00 UTC.

from pytz import utc
import datetime as dt

df = pd.read_csv('times.csv')
df['UTC_time'] = [utc.localize(t) - dt.timedelta(hours=int(h)) 
                  for t, h in zip(pd.to_datetime(df.time), 
                                  df.zone.str.extract(r'(-?\d+)'))]

>>> df
               location             time   zone                  UTC_time
0  EASTERN HILLSBOROUGH  1/27/2015 12:00  EST-5 2015-01-27 17:00:00+00:00
1  EASTERN HILLSBOROUGH   1/24/2015 7:00  EST-5 2015-01-24 12:00:00+00:00
2  EASTERN HILLSBOROUGH   1/27/2015 6:00  EST-5 2015-01-27 11:00:00+00:00
3  EASTERN HILLSBOROUGH   2/14/2015 8:00  EST-5 2015-02-14 13:00:00+00:00
4  EASTERN HILLSBOROUGH   2/7/2015 22:00  EST-5 2015-02-08 03:00:00+00:00
5  EASTERN HILLSBOROUGH    2/2/2015 2:00  EST-5 2015-02-02 07:00:00+00:00

Examining a single timestamp, you'll notice the timezone is set to UTC:

>>> df.UTC_time.iat[0]
Timestamp('2015-01-27 17:00:00+0000', tz='UTC')

>>> df.UTC_time.iat[0].tzname()
'UTC'

To display them in a different time zone:

fmt = '%Y-%m-%d %H:%M:%S %Z%z'
>>> [t.astimezone('EST').strftime(fmt) for t in df.UTC_time]
['2015-01-27 12:00:00 EST-0500',
 '2015-01-24 07:00:00 EST-0500',
 '2015-01-27 06:00:00 EST-0500',
 '2015-02-14 08:00:00 EST-0500',
 '2015-02-07 22:00:00 EST-0500',
 '2015-02-02 02:00:00 EST-0500']

Here is a test. Let's change the timezones in df and see if alternative solutions still work:

df['zone'] = ['EST-5', 'CST-6', 'MST-7', 'GST10', 'PST-8', 'AKST-9']
df['UTC_time'] = [utc.localize(t) - dt.timedelta(hours=int(h)) 
                  for t, h in zip(pd.to_datetime(df.time), 
                                  df.zone.str.extract(r'(-?\d+)'))]
>>> df
               location             time    zone                  UTC_time
0  EASTERN HILLSBOROUGH  1/27/2015 12:00   EST-5 2015-01-27 17:00:00+00:00
1  EASTERN HILLSBOROUGH   1/24/2015 7:00   CST-6 2015-01-24 13:00:00+00:00
2  EASTERN HILLSBOROUGH   1/27/2015 6:00   MST-7 2015-01-27 13:00:00+00:00
3  EASTERN HILLSBOROUGH   2/14/2015 8:00   GST10 2015-02-13 22:00:00+00:00
4  EASTERN HILLSBOROUGH   2/7/2015 22:00   PST-8 2015-02-08 06:00:00+00:00
5  EASTERN HILLSBOROUGH    2/2/2015 2:00  AKST-9 2015-02-02 11:00:00+00:00

Check the python docs for more details about working with time.

Here is a good SO article on the subject. How to make an unaware datetime timezone aware in python

And here is a link to the tz database timezones.

Community
  • 1
  • 1
Alexander
  • 105,104
  • 32
  • 201
  • 196