0

I have a list of strings that I am reading from a file - Each of the strings has a time offset that was recorded while storing the data.

date1= "Mon May 05 20:00:00 EDT 2014"
date2="Mon Nov 18 19:00:00 EST 2013"
date3="Mon Nov 07 19:00:00 PST 2013"

I need to find the difference in days between each pair of strings.

from datetime import datetime
from dateutil import tz

def days_hours_minutes(td):
    return td.days, td.seconds//3600, (td.seconds//60)%60

date1='Fri Dec 05 19:00:00  2014' # it does not work with EDT, EST etc.
date2='Fri Dec 03 19:00:00 2014'

fmt = "%a %b %d %H:%M:%S  %Y"

str1 = datetime.strptime(date1, fmt)
str2 = datetime.strptime(date2, fmt)
td=(str1-str2)
x=days_hours_minutes(td)
print x
#gives (2, 0, 0)

Basically, convert each string to its "my_time_obj" and then take the difference in days.

However, my actual string dates, have "EDT", "EST", "IST" etc - and on using the %Z notation, I get the ValueError: time data 'Fri Dec 05 19:00:00 EST 2014' does not match format '%a %b %d %H:%M:%S %Z %Y'

from the datetime documentation, I see that I can use %Z to convert this to a timezone notation - what am I missing ? https://docs.python.org/2/library/datetime.html

ekta
  • 1,560
  • 3
  • 28
  • 57

2 Answers2

0

I would go with parsing the timezone using pytz and do something like this (given that you know how your date string is built):

from datetime import datetime
from dateutil import tz
from pytz import timezone

def days_hours_minutes(td):
    return td.days, td.seconds//3600, (td.seconds//60)%60

date1_str ='Fri Dec 05 19:00:00 2014 EST'
date2_str ='Fri Dec 03 19:00:00 2014 UTC'

fmt = "%a %b %d %H:%M:%S %Y"

date1_list = date1_str.split(' ')
date2_list = date1_str.split(' ')

date1_tz = timezone(date1_list[-1]) # get only the timezone without date parts for date 1
date2_tz = timezone(date2_list[-1]) # get only the timezone without date parts for date 2
date1 = date1_tz.localize(datetime.strptime(' '.join(date1_list[:-1]), fmt)) # get only the date parts without timezone for date 1
date2 = date2_tz.localize(datetime.strptime(' '.join(date2_list[:-1]), fmt)) # get only the date parts without timezone for date 2
td=(date1-date2)
x=days_hours_minutes(td)
print x
wilfo
  • 685
  • 1
  • 6
  • 19
  • of-course that is the fallback hack- but I was looking for more. – ekta Feb 16 '15 at 14:15
  • In that case, it seems that the formatting of the strptime is limited to these two values `utc`, `gmt` and your local timezone (according to my `/usr/lib/python2.7/_strptime.py` line 212 ) – wilfo Feb 16 '15 at 14:53
0

Converting time strings to POSIX timestamps and finding the differences using only stdlib:

#!/usr/bin/env python
from datetime import timedelta
from email.utils import parsedate_tz, mktime_tz

dates = [
    "Mon May 05 20:00:00 EDT 2014",
    "Mon Nov 18 19:00:00 EST 2013",
    "Mon Nov 07 19:00:00 PST 2013",
]
ts = [mktime_tz(parsedate_tz(s)) for s in dates] # timestamps
differences = [timedelta(seconds=a - b) for a, b in zip(ts, ts[1:])]
print("\n".join(map(str, differences)))

Read the above links about the inherit ambiguity of the input. If you want a more robust solution; you have to use explicit pytz timezones such as 'America/New_York' or else email module hardcodes "timezone abbr. to utc offset" mapping e.g., EDT -> -0400, EST -> -0500, PST -> -0800.

Output

168 days, 0:00:00
10 days, 21:00:00

differences is a list of timedelta objects, you could get full days using td.days attribute (for non-negative intervals) or to get the value including fractions:

days = td.total_seconds() / 86400
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670