3

Trying to extract the date out of this string:

Publisher: Broadway Books; Anniversary, Reprint edition (October 8, 2002)

I want to get this: October 8, 2002

This is the regex I was using. Goal is to make it work for any date in the format above. It works when I test it on https://regex101.com/ but returns "None" in my code.

pattern = re.compile("(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}")
date = pattern.match(tag.get_text())
Siunami
  • 417
  • 2
  • 6
  • 13
  • 2
    "Does not work" isn't very informative. Do you get anything? If so, what? Are you sure `tag` has text that should match? Please provide the value of `tag` so we can try it. – cco Jan 06 '19 at 03:35
  • 3
    You should not, in general, use `match`. It is not doing what you think it is doing. Use `search` instead. Your fixed code works for me: `pattern.search("foo October 8, 2002 bar").group(0)` -> `'October 8, 2002'`. – DYZ Jan 06 '19 at 03:35
  • @cco returns none in the code. I've edited my question above – Siunami Jan 06 '19 at 03:47

1 Answers1

9

You're using re.match, which sees if the text matches the pattern at the beginning of the string. Use re.search instead, which looks for matches anywhere in the string. See here for more info.

Code:

import re

text = "Publisher: Broadway Books; Anniversary, Reprint edition (October 8, 2002)"
pattern = re.compile(
    "(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|"
    "Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|"
    "Dec(ember)?)\s+\d{1,2},\s+\d{4}")

print(pattern.match(text))  # prints None
print(pattern.search(text))
print(pattern.search(text).group())

Results:

None
<_sre.SRE_Match object; span=(57, 72), match='October 8, 2002'>
October 8, 2002
iz_
  • 15,923
  • 3
  • 25
  • 40