1

I'm wondering how to use parsedatetime for Python to return both the timestruct and the rest of the input string with just the date/time input removed.

Example:

import parsedatetime
p = parsedatetime.Calendar()
p.parse("Soccer with @homies at Payne Whitney at 2 pm")

returns:

time.struct_time(tm_year=2020, tm_mon=1, tm_mday=12, tm_hour=13, tm_min=9, tm_sec=59, tm_wday=6, tm_yday=12, tm_isdst=0), 0)

but I'd also like it to return:

"Soccer with @homies at Payne Whitney"

Is there a way to do that with parsedatetime, or would it require a different Python package?

P.S.

I promise this has a practical application, we're using it to build this: magical.app

Hojung Kim
  • 143
  • 1
  • 2
  • 13

1 Answers1

1

The only method of Calendar that returns that info is nlp() (which I suppose stands for Natural Language Processing). Here is a function returning all parts of the input:

import parsedatetime

calendar = parsedatetime.Calendar()

def parse(string, source_time = None):
    ret = []
    parsed_parts = calendar.nlp(string, source_time)
    if parsed_parts:
        last_stop = 0
        for part in parsed_parts:
            dt, status, start, stop, segment = part
            if start > last_stop:
                ret.append((None, 0, string[last_stop:start]))
            ret.append((dt, status, segment))
            last_stop = stop
        if len(string) > last_stop:
            ret.append((None, 0, string[last_stop:]))
    return ret

for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
    print()
    print(s)
    result = parse(s)
    for part in result:
        print(part)

Output:

Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 16, 0), 3, 'tomorrow at 2 pm to 4 pm')
(None, 0, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 16, 0), 2, 'at 2 pm to 4 pm')
(None, 0, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 15, 0), 2, 'at 3 pm')
(None, 0, ' to ')
(datetime.datetime(2020, 1, 14, 17, 0), 2, '5 pm')
(None, 0, '!')

The status tells you whether the associated datetime is actually a date (1), a time (2), a datetime (3) or neither (0). In the first two cases, the missing fields are taken from the source_time, or from the current time if that is None.

But if you examine the output closely, you will see that there is a reliability problem here. Only the third parse can be used, in the other two cases information has been lost. Furthermore, I have no idea why the second and third string would be parsed differently.

An alternative library is dateparser. It looks more powerful, but has its own problems. The dateparser.parse.search_dates() function comes close to what you are interested in, but I haven't been able to find out how to tell whether a parsed datetime conveys only date information, only time information, or both. Anyway, here is a function that uses search_dates() to yield an output similar to the above, but without the status of each part:

from dateparser.search import search_dates

def parse(string: str):
    ret = []
    parsed_parts = search_dates(string)
    if parsed_parts:
        last_stop = 0
        for part in parsed_parts:
            segment, dt = part
            start = string.find(segment, last_stop)
            stop = start + len(segment)
            if start > last_stop:
                ret.append((None, string[last_stop:start]))
            ret.append((dt, segment))
            last_stop = stop
        if len(string) > last_stop:
            ret.append((None, string[last_stop:]))
    return ret


for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
    print()
    print(s)
    result = parse(s)
    for part in result:
        print(part)

Output:

Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 14, 0), 'tomorrow at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 726130), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 14, 0), 'at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 784468), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 15, 0), 'at 3 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 17, 0), '5 pm')
(None, '!')

I think that searching for the substring in the input is acceptable, and the parsing seems more predictable, but not knowing how to interpret each datetime is a problem.

Walter Tross
  • 12,237
  • 2
  • 40
  • 64
  • 1
    I posted a follow-up question: https://stackoverflow.com/questions/59726449/how-to-determine-whether-dateparser-search-search-dates-returns-dates-times – Walter Tross Jan 14 '20 at 01:40
  • 1
    I've done some more searching and found a couple that I think are more powerful called dateutil: https://github.com/dateutil/dateutil & timefhuman: https://github.com/alvinwan/timefhuman It seems to do well with most of the examples I can think of, but each solution has its own issues – Hojung Kim Jan 14 '20 at 03:29
  • sutime seems to be by far the most powerful one out of the bunch, but is actually quite difficult to configure for the first time. I'm running into a lot of errors simply following their instructions in the documentation – Hojung Kim Jan 14 '20 at 21:30
  • I'm actually having a good deal of trouble with the initial installation for sutime. The main issue arises here: `mvn dependency:copy-dependencies -DoutputDirectory=./jars` When I run that, terminal returns: `zsh: command not found: mvn` I've installed Maven and added it to path with no luck :( – Hojung Kim Jan 14 '20 at 21:48
  • 1
    are you on a Mac? If yes, you should `brew install maven`, which should take care of paths and everything (I'm on a Mac with maven, only I have bash instead of zsh) – Walter Tross Jan 14 '20 at 21:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205970/discussion-between-hojung-kim-and-walter-tross). – Hojung Kim Jan 14 '20 at 21:59