2

I have a series of time differences, such as:

7 months
11 months
1 hour, 24 minutes
10 months, 3 weeks
1 year
1 year, 1 month
8 months, 2 weeks
2 months
2 months, 4 weeks
8 months, 1 week
9 months, 3 weeks

and I want to convert them to an absolute value, such as in all in seconds for sorting purposes. Yes, I could write my own library, but I wanted to know if something already exists.

Google has not been helpful, because it has way too many timestamp results.

vossman77
  • 1,397
  • 14
  • 13
  • 1
    Have you seen [this](http://stackoverflow.com/questions/9775743/how-can-i-parse-free-text-time-intervals-in-python-ranging-from-years-to-second)? – jsfan Feb 22 '16 at 03:01
  • Seconds, minutes, hours, days, and weeks are all pretty fixed in length, but months and years are of different sizes. – PaulMcG Feb 22 '16 at 04:32
  • Like I said, this is really hard to search for due to timestamp noise. But I do like jsbueno's solution better that the linked one. – vossman77 Feb 22 '16 at 20:46

1 Answers1

4

Python has the "timedelta" class in the datetime module - it can't parse the quantities above, but you can, with some minimal parsing, create timedelta objects which are directly comparable (and can be added and subtracted directly to normal date or datetime objects);

In [1]: from datetime import timedelta
In [5]: x = timedelta(weeks=40)
In [6]: x
Out[6]: datetime.timedelta(280)

timedelta can take weeks, years, days and seconds as keyweord parameters, but not months since their lenght is not well defined. Also, the most usual way of creating a timedelta is by subtracting two date (or datetime) objects.

This small function takes advantage of the time units you are using being almost the same that are accepted as a timedelta constructor to save some lines in parsing your time differences in English and creating a timedelta object from them, using regular expressions:

import re
from datetime import timedelta
def get_timedelta(line):
    timespaces = {"days": 0}
    for timeunit in "year month week day hour minute second".split():
        content = re.findall(r"([0-9]*?)\s*?" + timeunit, line)
        if content:
            timespaces[timeunit + "s"] = int(content[0])
    timespaces["days"] += 30 * timespaces.pop("months", 0) + 365 * timespaces.pop("years", 0)
    return timedelta(**timespaces)

And using the examples you provide, one has:

In [26]: lines = """7 months
11 months                     
1 hour, 24 minutes
10 months, 3 weeks
1 year
1 year, 1 month
8 months, 2 weeks
2 months
2 months, 4 weeks
8 months, 1 week
9 months, 3 weeks""".split("\n")

In [27]: for line in lines:
    print(get_timedelta(line))
   ....:     
210 days, 0:00:00
330 days, 0:00:00
1:24:00
321 days, 0:00:00
365 days, 0:00:00
395 days, 0:00:00
254 days, 0:00:00
60 days, 0:00:00
88 days, 0:00:00
247 days, 0:00:00
291 days, 0:00:00
jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • works great, I had some unicoding issue, but after fixing it was exactly what I wanted. – vossman77 Feb 22 '16 at 03:52
  • 1
    For "unicoding issues" there are two recomendations - first, having a good grasp of that text (and unicode) is (this article has it: http://www.joelonsoftware.com/articles/Unicode.html) - the second is moving on to Python 3.x. In Python 3 all of the above strings would be the equivalent to "unicode strings" in Python2, and it would simply work. – jsbueno Feb 22 '16 at 13:26
  • had to do a decode and then an encode... response = urllib2.urlopen(url); html = response.readlines(); for line in html: sline = normalize('NFKD', line.decode("utf8")).encode('ASCII', 'ignore'); – vossman77 Feb 22 '16 at 20:41
  • We have 131,443 lines of python 2.7 code and still support RHEL/CentOS 6, so python 3 is not ready for prime time, perhaps soon. – vossman77 Feb 22 '16 at 20:45
  • 1
    Sorry, but from 2015 on, if you are startign a new project, you shoud do it in Python3. That is a completly different issue of "when will all legacy code ported". – jsbueno Feb 22 '16 at 22:16