6

I'm new to python and just wrote this module level function:

def _interval(patt):
    """ Converts a string pattern of the form '1y 42d 14h56m'
    to a timedelta object.
    y - years (365 days), M - months (30 days), w - weeks, d - days,
    h - hours, m - minutes, s - seconds"""

    m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt)

    args = {'weeks': 0.0,
            'days': 0.0,
            'hours': 0.0,
            'minutes': 0.0,
            'seconds': 0.0}

    for (n,q) in m:
        if q=='y':
            args['days'] += float(n)*365
        elif q=='M':
            args['days'] += float(n)*30
        elif q=='w':
            args['weeks'] += float(n)
        elif q=='d':
            args['days'] += float(n)
        elif q=='h':
            args['hours'] += float(n)
        elif q=='m':
            args['minutes'] += float(n)
        elif q=='s':
            args['seconds'] += float(n)

    return _dt.timedelta(**args)

My issue is with the for loop here i.e the long if elif block, and was wondering if there is a more pythonic way of doing it.
So I re-wrote the function as:

def _interval2(patt):

    m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt)

    args = {'weeks': 0.0,
            'days': 0.0,
            'hours': 0.0,
            'minutes': 0.0,
            'seconds': 0.0}

    argsmap = {'y': ('days', lambda x: float(x)*365),
               'M': ('days', lambda x: float(x)*30),
               'w': ('weeks', lambda x: float(x)),
               'd': ('days', lambda x: float(x)),
               'h': ('hours', lambda x: float(x)),
               'm': ('minutes', lambda x: float(x)),
               's': ('seconds', lambda x: float(x))}

    for (n,q) in m:
        args[argsmap[q][0]] += argsmap[q][1](n)

    return _dt.timedelta(**args)

I tested the execution times of both the codes using timeit module and found that the second one took about 5-6 seconds longer (for the default number of repeats).

So my question is:
1. Which code is considered more pythonic?
2. Is there still a more pythonic was of writing this function?
3. What about the trade-offs between pythonicity and other aspects (like speed in this case) of programming?

p.s. I kinda have an OCD for elegant code.

EDITED _interval2 after seeing this answer:

argsmap = {'y': ('days', 365),
           'M': ('days', 30),
           'w': ('weeks', 1),
           'd': ('days', 1),
           'h': ('hours', 1),
           'm': ('minutes', 1),
           's': ('seconds', 1)}

for (n,q) in m:
    args[argsmap[q][0]] += float(n)*argsmap[q][1]
Community
  • 1
  • 1
  • Related: http://stackoverflow.com/questions/4628122/how-to-construct-a-timedelta-object-from-a-simple-string/4628148#4628148 – ChristopheD Jan 17 '11 at 14:28
  • 2
    Why on earth do you get timedeltas in a format that claims months are 30 days, when they mostly are not? :) – Lennart Regebro Jan 17 '11 at 15:40
  • @Lennart Haha! Yes, they are redundant (even the year as 365 days, if you consider a leap year). Since these functions are about time intervals and are not used to calculate future dates, I'll just keep them as added functionality. – Kashyap Nadig Jan 17 '11 at 16:25
  • No matter that they are redundant, the problem is that they are wrong. The difference between 4th of January 2011 and 4th of March 2011 is 59 days. The difference between 4th of July 2011 and 4th of September 2011 is 62 days. Both differences are 2 months, but not according to your syntax. Calculating future dates is what timedeltas are all about... – Lennart Regebro Jan 17 '11 at 16:54
  • If you want to be OCD then please be OCD for errors. The code you've posted ignores the "3" in "5y 3", gives a "ValueError: invalid literal for float():" for "-d" and "ValueError: empty string for float()" for the reasonable looking "3.d". Also, the use of leading _ as in "_re" and "_dt" is not pythonic. – Andrew Dalke Jan 17 '11 at 17:14

3 Answers3

4

You seem to create a lot of lambdas every time you parse. You really don't need a lambda, just a multiplier. Try this:

def _factor_for(what):
    if what == 'y': return 365
    elif what == 'M': return 30
    elif what in ('w', 'd', 'h', 's', 'm'): return 1
    else raise ValueError("Invalid specifier %r" % what)

for (n,q) in m:
    args[argsmap[q][0]] += _factor_for([q][1]) * n

Don't make _factor_for a method's local function or a method, though, to speed things up.

9000
  • 39,899
  • 9
  • 66
  • 104
3

(I have not timed this, but) if you're going to use this function often it might be worth pre-compiling the regex expression.

Here's my take on your function:

re_timestr = re.compile("""
    ((?P<years>\d+)y)?\s*
    ((?P<months>\d+)M)?\s*
    ((?P<weeks>\d+)w)?\s*
    ((?P<days>\d+)d)?\s*
    ((?P<hours>\d+)h)?\s*
    ((?P<minutes>\d+)m)?\s*
    ((?P<seconds>\d+)s)?
""", re.VERBOSE)

def interval3(patt):
    p = {}
    match = re_timestr.match(patt)
    if not match: 
        raise ValueError("invalid pattern : %s" % (patt))

    for k,v in match.groupdict("0").iteritems():
        p[k] = int(v) # cast string to int

    p["days"] += p.pop("years")  * 365 # convert years to days
    p["days"] += p.pop("months") * 30  # convert months to days
    return datetime.timedelta(**p)

update

From this question, it looks like precompiling regex patterns does not bring about noticeable performance improvement since Python caches and reuses them anyway. You only save the time it takes to check the cache which, unless you are repeating it numerous times, is negligible.

update2

As you quite rightly pointed out, this solution does not support interval3("1h 30s" + "2h 10m"). However, timedelta supports arithmetic operations which means one you can still express it as interval3("1h 30s") + interval3("2h 10m").

Also, as mentioned by some of the comments on the question, you may want to avoid supporting "years" and "months" in the inputs. There's a reason why timedelta does not support those arguments; it cannot be handled correctly (and incorrect code are almost never elegant).

Here's another version, this time with support for float, negative values, and some error checking.

re_timestr = re.compile("""
    ^\s*
    ((?P<weeks>[+-]?\d+(\.\d*)?)w)?\s*
    ((?P<days>[+-]?\d+(\.\d*)?)d)?\s*
    ((?P<hours>[+-]?\d+(\.\d*)?)h)?\s*
    ((?P<minutes>[+-]?\d+(\.\d*)?)m)?\s*
    ((?P<seconds>[+-]?\d+(\.\d*)?)s)?\s*
    $
""", re.VERBOSE)

def interval4(patt):
    p = {}
    match = re_timestr.match(patt)
    if not match: 
        raise ValueError("invalid pattern : %s" % (patt))

    for k,v in match.groupdict("0").iteritems():
        p[k] = float(v) # cast string to int

    return datetime.timedelta(**p)

Example use cases:

>>> print interval4("1w 2d 3h4m")  # basic use
9 days, 3:04:00

>>> print interval4("1w") - interval4("2d 3h 4m") # timedelta arithmetic 
4 days, 20:56:00

>>> print interval4("0.3w -2.d +1.01h") # +ve and -ve floats 
3:24:36

>>> print interval4("0.3x") # reject invalid input 
Traceback (most recent call last): 
  File "date.py", line 19, in interval4
    raise ValueError("invalid pattern : %s" % (patt)) 
ValueError: invalid pattern : 0.3x

>>> print interval4("1h 2w") # order matters 
Traceback (most recent call last):   
  File "date.py", line 19, in interval4
    raise ValueError("invalid pattern : %s" % (patt)) 
ValueError: invalid pattern : 1h 2w
Community
  • 1
  • 1
Shawn Chin
  • 84,080
  • 19
  • 162
  • 191
  • I had to same idea too, but this won't parse things like "1w 1w 1w" or "1s 1y" - which seems much better than the original regex to me, but might not fit the requirements. – Jochen Ritzel Jan 17 '11 at 15:33
  • I suspect one would have a lot more to worry about if input like "1w 1w 1w" is allowed; should that be 3 weeks, or 1 week, or a typo? – Shawn Chin Jan 17 '11 at 15:49
  • Yes, I thought of this regex too. But mine kinda supports duck-typing: `"1h30m" + "2h45m"` automatically results in a `timedelta` of `"4h15m"`. – Kashyap Nadig Jan 17 '11 at 16:13
-1

Yes, there is. Use time.strptime instead:

Parse a string representing a time according to a format. The return value is a struct_time as returned by gmtime() or localtime().

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
ulidtko
  • 14,740
  • 10
  • 56
  • 88
  • `time.strptime()` is for datetime objects, not for timedelta. 2 months 10 days is not 10-February. – eumiro Jan 17 '11 at 14:28