I have written a python script with the following function, which takes as input a file name that contains multiple dates.
CODE
import re
from datetime import datetime
def ExtractReleaseYear(title):
rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
match = rg.search(title) # Using non-greedy match on filler
if match:
releaseYear = match.group(1)
try:
if int(releaseYear) >= 1900 and int(releaseYear) <= int(datetime.now().year) and int(releaseYear) <= 2099: # Film between 1900-2099
return releaseYear
except ValueError:
print("ERROR: The film year in the file name could not be converted to an integer for comparison.")
return ""
print(ExtractReleaseYear('2012.(2009).3D.1080p.BRRip.SBS.x264'))
print(ExtractReleaseYear('Into.The.Storm.2012.1080p.WEB-DL.AAC2.0.H264'))
print(ExtractReleaseYear('2001.A.Space.Odyssey.1968.1080p.WEB-DL.AAC2.0.H264'))
OUTPUT
Returned: 2012 -- I'd like this to be 2009 (i.e. last occurrence of year in string)
Returned: 2012 -- This is correct! (last occurrence of year is the first one, thus right)
Returned: 2001 -- I'd like this to be 1968 (i.e. last occurrence of year in string)
ISSUE
As can be observed, the regex will only target the first occurrence of a year instead of the last. This is problematic because some titles (such as the ones included here) begin with a year.
Having searched for ways to get the last occurrence of the year has led me to this resources like negative lookahead, last occurrence of repeated group and last 4 digits in URL, none of which have gotten me any closer to achieving the desired result. No existing question currently answers this unique case.
INTENDED OUTCOME
- I would like to extract the LAST occurrence (instead of the first) of a year from the given file name and return it using the existing definition/function as stated in the output quote above. While I have used online regex references, I am new to regex and would appreciate someone showing me how to implement this filter to work on the file names above. Cheers guys.