-5

I have the following code in Python:

txt = 'Ted\'s date of birth is 5-6-2005 and he started college at 08-5-2019'

year = re.compile(r'[1900-2023]+')

res = year.findall(txt)

for i in res:
    print(i)

the code above returns:

200
0
2019

since [1900-2023] returns any match between range of 1900 to 2023, why here it returned 200 and 0 which is out of this range? Moreover it even didn't return 2005 which is within this range.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mir Stephen
  • 1,758
  • 4
  • 23
  • 54
  • 3
    *"since [1900-2023] returns any match between range of 1900 to 2023"*: no, it's wrong. `-` defines a range between characters only, not between strings or numbers. – Casimir et Hippolyte Aug 19 '19 at 12:14
  • `[1900-2023]` means any characters in the group {1, 9, 0, {0, 1, 2}, 0, 2, 3} where the inner group comes from the range `0-2`. The expression is equivalent to `[01239]`. – Wai Ha Lee Aug 19 '19 at 12:21
  • `[0-9] Returns a match for any digit between 0 and 9` I saw this explanation and it was a bit vague, now you shaded light upon it more. Thank you for taking time and helping me out. – Mir Stephen Aug 19 '19 at 12:32
  • You may automatically generate such ranges, there are a lot of Web sites where you may generate these regexps for free, [here is one of them](http://gamon.webfactional.com/regexnumericrangegenerator/) – Wiktor Stribiżew Mar 05 '20 at 13:49

2 Answers2

7

As stated in other answers/comments, [1900-2023] does not match any number between 1900 and 2023, rather matches any character that is a 1, 9, 0, -, 2, or 3. For your specific case, you could make a pattern that matches these numbers on your own:

19[0-9]{2}|20[01][0-9]|202[0-3]

Explanation:

19[0-9]{2}  - "19" and exactly 2 numbers that range 0 - 9 (1900 - 1999)
|           - OR
20[01][0-9] - "20" and either a 0 or 1 and another number that ranges 0 - 9 (2000 - 2019)
|           - OR
202[0-3]    - "202" and one number in a range 0 - 3 (2020 - 2023)
dvo
  • 2,113
  • 1
  • 8
  • 19
4

[1900-2023] doesn't return any number between 1990 and 2023. [ ] can be used for character ranges, not string/numeric ranges. So you can write [1-9] or [a-f], but not [10-20] or [aa-zz].

I would suggest to find any 4-digit number with \d{4} regex then convert it to int and check if it's in the range that interests you.

mrzasa
  • 22,895
  • 11
  • 56
  • 94
  • 1
    Or, more precisely, you *can* write `[10-20]` but it means "one of '1', a character in the range '0'-'2', and '0'" (which is obviously the same as "one of '0', '1', or '2'") – Martin Bonner supports Monica Aug 19 '19 at 12:17
  • Depending on the search text, a better re to match might be `([12]\d3)` (you *can* write an re to match exactly, but it's painful) – Martin Bonner supports Monica Aug 19 '19 at 12:19
  • 1
    `[1900-2023]+` == `([190]|[0-2]|[023])+` (`-` makes a range only between characters, the rest of the square brackets works independently as per square brackets regex rules) == `[01239]+`. – h4z3 Aug 19 '19 at 12:20