0

I am using the following code to identify years in my text.

import re
match = re.match(r'.*([1-2][0-9]{3})', text)
print(match.group(1))

However, this also accepts years such as 2999, 2078 that are still not valid.

Therefore, I would like to know how to identify years till recent in Python (i.e up to 2018).

smci
  • 32,567
  • 20
  • 113
  • 146
EmJ
  • 4,398
  • 9
  • 44
  • 105
  • Add a condition check: `if int(match.group(1)) <= 2018: ...` – DYZ Jan 03 '19 at 08:23
  • If you need ourely regexp solution then you can write `r'.*(1[0-9]{3})|(200[0-9])|(201[1-8])' but whats the point in doing that? You can just call `int(text[-4:])` and compare it to 2018 – Denis Babochenko Jan 03 '19 at 08:25
  • What does "from recent" mean, since 1900? 2000? 100 years ago? etc. Please edit the question to clarify. – smci Jan 03 '19 at 08:27

4 Answers4

2

Option 1: List them out one-by-one:

r = re.compile(r"(?!\d)(?:1[0-9]{3}|20[01][0-9])(?!\d)")
match = r.search(text)

This will give you years 1000 to 2019


Option 2: Extract the number, convert to int and compare.

match = re.match(r'.*([1-2][0-9]{3})', text)
year = int(match.group(0))
if 1000 <= year <= 2019:
    do_your_stuff()

You can fetch the current year programmatically:

from datetime import datetime

year = datetime.now().year
  • 1
    I presume that the year doesn't want to be past the current year? If so I would suggest getting the year from the datetime package instead of hard-coded to 2019 - https://stackoverflow.com/questions/30071886/how-to-get-current-time-in-python-and-break-up-into-year-month-day-hour-minu – tgrobinson Jan 03 '19 at 08:26
  • Hi, when I give the text as ""2018 and 2017 in" it returns None. But I want it to return [2018,2017] – EmJ Jan 03 '19 at 08:56
  • @Emi Use `re.findall`. –  Jan 03 '19 at 09:08
1

I would try to cast the given value to an integer and check if it's bigger than e. g. 1900 and smaler or equal to 2018/2019

Marvin Klar
  • 1,869
  • 3
  • 14
  • 32
1

You could pass the matched string to the code below, and compare the matched year with the current year, and return True if the matched year is equal to or less than the current year.

from datetime import datetime

matched_string = "2020"
past = datetime.strptime(matched_string, "%Y")
present = datetime.now()

print(past.date() < present.date())
Icehorn
  • 1,247
  • 8
  • 15
0

pure regular expression

(\d{1,3})|(1\d{3})|(201[0-8])|(200\d)
limen
  • 67
  • 7