I have the following string:
s = '2014 2026 202 20 1000 1949 194 195092 20111a a2011a a2011 keep this text n0t th1s th0ugh 1 0 2015 2025 2026'
I want to replace with ''
every part of this string which contains a number, except for those parts of the string that are in the year range 1950 to 2025. The resultant string would look like this (don't worry about the extraneous whitespace):
'2014 keep this text 2015 2025 '
So, effectively I want the brute-force removal of anything and everything remotely "numerical," except for something standalone (i.e. not part of another string, and of length 4 excluding whitespace) that resembles a year.
I know I can use this to remove everything containing digits:
re.sub('\w*[0-9]\w*', '', s)
But that doesn't return what I want:
' keep this text '
Here's my attempt at replacing anything that doesn't match the patterns listed below:
re.sub(r'^([A-Za-z]+|19[5-9]\d|20[0-1]\d|202[0-5])', '*', s)
Which returns:
'* 2026 202 20 1000 1949 194 195092 20111a a2011a a2011 keep this text n0t th1s th0ugh 1 0 2015 2025 2026'
I've been here and here, but wasn't able to find what I was looking for.