2

I am building a chatbot in python. I need to extract dates from the input text from the user. Some test cases can be as follows:

1."last week of july"
2." in june"
3. "last month"
4. "last n days"

These are just a sample. After extracting dates i need to build a query in sql. I have hot encoded for few cases. But when i think, more and more cases arise for which hot encoding might be time consuming. Is there any built in library which can ease my work ?

Karanam Krishna
  • 365
  • 2
  • 16
  • basically, you asked for the full chatrobot. this is exactly the idea of this robots, to built an automatic recognizer. as starting point you can maybe try this one: https://stackoverflow.com/questions/9507648/datetime-from-string-in-python-best-guessing-string-format ... – PV8 Jun 07 '19 at 07:09
  • i expected that i was asking too much of it. ok cool, i will look into it. I will continue what i was doing. – Karanam Krishna Jun 07 '19 at 07:13

2 Answers2

2

You can use the dateparser library.

import dateparser

nl_dates = ["last week of july", " in june", "last month", "last n days"]

for nl_date in nl_dates:
    res = dateparser.parse(nl_date)
    if res:
         print('"{}"": {}'.format(nl_date,res.date()))

" in june": 2019-06-12
"last month": 2019-05-12

This library would be able to correctly address 2 of your 4 examples. In addition you may find helpful to be using a NER (Named Entity Recognition) model, spacy offers one:

import spacy
nlp = spacy.load("en_core_web_sm")
nl_dates = ["last week of july", " in june", "last month", "last 7 days"]

for nl_date in nl_dates:
    doc = nlp(nl_date)
    for entity in doc.ents:
        print('{}: {}'.format(entity.label_, entity.text))

"DATE: last week"
"DATE: last month"
"DATE: last 7 days"

In general you may want your chatbot to specifically ask for the date in a format easier to detect when it is not able to extract it from the text.

B.Anto
  • 21
  • 3
0

I would consider using NLP APIs such as Dialogflow or Wit.ai. Personally, I prefer Wit.ai because it can recognize both dates and date ranges, and also accepts context, which allows you to adjust the recognition according to the user's timezone (which can save you a lot of trouble - depending on what time it is in the user's zone, "Wednesday" or "next week" can have different interpretations).

Venomouse
  • 311
  • 2
  • 12