Regex failing with text conversion to days - Python 3.10.x

Question

I have a list of time durations in text, for example, ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']

I need to build a function to take these durations and instead come up with the total number of days.

The specific text could be a single day, days and hours, hours and minutes, a single set of minutes, or a day, hour, and minute.

I have tried the following:

def parse_dates(data):
    days = int(re.match(r'\d+\sDay', data)[0].split(' ')[0]) if re.match(r'\d+\sDay', data) is not None else 0
    hours = int(re.match(r'\d+\sHour', data)[0].split(' ')[0]) if re.match(r'^\d+Hour*s$', data) is not None else 0
    minutes = int(re.match(r'\d+\sMinute', data)[0].split(' ')[0]) if re.match(r'\d+\sMinute', data) is not None else 0

    days += hours / 24
    days += minutes / 1440

    return days

The provided function fails regardless of using re.match() or re.search(), leading me to believe there is a problem with the expression itself.

However, the hours and minutes are ALWAYS showing as 0. How can I fix my regex, or devise a better solution, to parse these files appropriately?

will you ever get a string like `'48 hours'`? I'm just trying to understand why you're adding hours and minutes to the result. — rv.kvetch, Sep 23 '22 at 17:54
You could, sure, which is why I am trying to account for the situation(s) where you do not get a match for . And the results should be the total eclipsed time -- i.e., the amount of total time in days it took for us to finish building this equipment. — artemis, Sep 23 '22 at 17:57
Does this answer your question? [Python regular expression re.match, why this code does not work?](https://stackoverflow.com/questions/14933771/python-regular-expression-re-match-why-this-code-does-not-work) — mkrieger1, Sep 23 '22 at 18:02
Yes, and the answer to that question tells you why they don't work and what you can do differently. — mkrieger1, Sep 23 '22 at 18:06
The linked question covers `match`. The only problem if `search` is used is that the test regex for hours is incorrect and doesn't match the regex used to parse hours, which is basically a typo. — outis, Sep 24 '22 at 11:31

rv.kvetch · Accepted Answer · 2022-09-23T19:07:08.507

You could try the following regex (Demo):

(?:(\d+) Days?)?(?: ?(\d+) Hours?)?(?: ?(\d+) Minutes?)?

Explanation:

(?:...) marks a non-capturing group
(...) marks a captured group
? after a symbol or group means it is optional
\d+ means one or more digits (0123...)

Sample Python implementation:

import re

_DHM_RE = re.compile(r'(?:(\d+) Days?)?(?: ?(\d+) Hours?)?(?: ?(\d+) Minutes?)?')
_HOURS_IN_DAY = 24
_MINUTES_IN_DAY = 60 * _HOURS_IN_DAY


def parse_dates(s: str) -> int:
    m = _DHM_RE.search(s)
    if m is None:
        return 0

    days = int(m.group(1) or 0)
    hours = int(m.group(2) or 0)
    minutes = int(m.group(3) or 0)

    days += hours / _HOURS_IN_DAY
    days += minutes / _MINUTES_IN_DAY

    return int(days)


strings = """\
142 Days 16 Hours
128 Days 9 Hours 43 Minutes
10 Minutes
52 Hours
""".splitlines()

for s in strings:
    d = parse_dates(s)
    print(f'{s!r} has {d} days.')

score 0 · Answer 2 · answered Sep 23 '22 at 18:16

Here's a way to do it:

import re
a = ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']
def parse_dates(data):
    x = [re.search('(\d+)\s' + unit, data) for unit in ['Day', 'Hour', 'Minute']]
    x = [0 if y is None else int(y.group(1)) for y in x]
    return x[0] + x[1] / 24 + x[2] / 1440
[print(parse_dates(data)) for data in a]

Output:

142.66666666666666
128.4048611111111
0.006944444444444444

Regex failing with text conversion to days - Python 3.10.x

2 Answers2