8

Say i have a list of sentences, many of which contain numbers (but not all):

mylist = [
"The current year is 2015 AD.",
"I have 2 dogs."
...
]

I want to know which elements in the list contain a valid year (say, between 1000 and 3000). I know this is a regex issue, and i have found a few posts (e.g., this one) that address detecting digits in strings, but nothing on full years. Any regex wizards out there?

Community
  • 1
  • 1
Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42
  • 1
    Or [Regular expression match to test for a valid year](http://stackoverflow.com/q/4374185), but that doesn't contain the right answer: don't use regex. Then there's also [Regular expression numeric range](http://stackoverflow.com/q/1377926), which does. – jscs Nov 26 '15 at 04:24

3 Answers3

14

Sounds like you are looking for a regex that will find 4 digit numbers where the first digit is between 1 & 3 and the next 3 digits are between 0 and 9 so I think you are looking for something like this

[1-3][0-9]{3}

If you want to accept strings that contain this you could do

.*([1-3][0-9]{3})
pwilmot
  • 586
  • 2
  • 8
  • that's the regex i was looking for! – Nolan Conaway Nov 26 '15 at 04:29
  • 4
    Note that this also allows the values 3001 to 3999 – aweis Nov 26 '15 at 04:30
  • 1
    Did you notice that the other two answers, posted five minutes before this one, contain the same pattern, @nolanconaway? – jscs Nov 26 '15 at 04:54
  • @JoshCaswell i noticed! All of these answers gave me the regex i was looking for so i just picked the one i copied it out of. – Nolan Conaway Nov 27 '15 at 22:33
  • I think re.findall(r'.*([1-3][0-9]{3})', 'September 1, 2017 - June 30, 2018') should return two different years. Am I right? – mece1390 Jan 28 '22 at 11:31
  • Easy enough to remove the 3001-3999 case: `[12][0-9]{3}|3000` But note that you don't need to use `.*` since `re.search( r'[12][0-9]{3}|3000', your_string )` will return a match anywhere in the string. – Stefan Apr 07 '23 at 15:44
12

Here's a simple solution:

import re
mylist = [] # init the list
for l in mylist:
    match = re.match(r'.*([1-3][0-9]{3})', l)
    if match is not None:
        # Then it found a match!
        print match.group(1)

This will check to see if there is a 4 digit number between 1000 and 3999

Chrispresso
  • 3,660
  • 2
  • 19
  • 31
3

Well a year can so fare be a lot of things. most commen it is 4 digits long yes, but it is just a number. If you want all years from 1000 and till 9999 you can use this regex: ([1-9][0-9]{3}) but to match the range you need: ([1-2][0-9]{3}|3000)

aweis
  • 5,350
  • 4
  • 30
  • 46