1

I am looking to identify and extract a date from a number of different strings. The dates may not be formatted the same. I have been using the datefinder package but I am having some issues saving the output.

Goal: Extract the date from a string, which may be formatted in a number of different ways (ie April,22 or 4/22 or 22-Apr etc) and if there is no date, set the value to 'None' and append the date list with either the date or 'None'.

Please see the examples below.

Example 1: (This returns a date, but does not get appended to my list)


import datefinder

extracted_dates = []
sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'

matches = datefinder.find_dates(sample_text)
for match in matches:
    if match == None:
        date = 'None'
        extracted_dates.append(date)
    else:
        date = str(match)
        extracted_dates.append(date)

Example 2: (This does not return a date, and does not get appended to my list)

import datefinder

extracted_dates = []
sample_text = 'As of the date, there were 28 dogs at the kennel.'

matches = datefinder.find_dates(sample_text)
for match in matches:
    if match == None:
        date = 'None'
        extracted_dates.append(date)
    else:
        date = str(match)
        extracted_dates.append(date)
Patriots_25
  • 81
  • 2
  • 9
  • I am having some trouble reproducing the Example1. After running the script the `extracted_dates` contains `['2019-02-27 00:00:00', '2020-05-28 00:00:00']` – Mike Xydas May 05 '20 at 12:54
  • @MikeXydas it appears that its reading '28 dogs' as May 28, 2020 – Patriots_25 May 05 '20 at 12:56

1 Answers1

1

I have tried using your package, but it seemed that there was no fast and general way of extracting the real date on your example.

I instead used the DateParser package and more specifically the search_dates method

I briefly tested it on your examples only.

from dateparser.search import search_dates

sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'
extracted_dates = []

# Returns a list of tuples of (substring containing the date, datetime.datetime object)
dates = search_dates(sample_text)

if dates is not None:
  for d in dates:
    extracted_dates.append(str(d[1]))
else:
  extracted_dates.append('None')

print(extracted_dates)
Mike Xydas
  • 469
  • 5
  • 12