0

I'm trying to find a number which is in the range 1 to 34 in my string but not getting the expected output. Code:

re.findall("(\d[1-34])","hi34hi30hi12")

Output:

['34','12']

Where is 30 here?? Or am I doing it wrong??

dev1998
  • 882
  • 7
  • 17
Vishal Yarabandi
  • 427
  • 1
  • 4
  • 9

2 Answers2

1

That regex is wrong. As it's been said in a comment above, your regex will match a single digit \d, followed by a single character in the set {1,2,3,4} which is the explicit meaning of the character class you used [1-34]

This one matches all the 2 digit numbers from 00 to 34 :

re.findall("([0-2][0-9]|3[0-4])","hi34hi30hi12")

this expression is made of two parts : the first

[0-2][0-9] 

matches two characters, the first be a 0, a 1 or a 2 and the second a numerical digit; the second part is alternative the the first match (using the | operator)

3[0-4]

and matches a 3 followed by a 0, a 1, a 2, a 3 or a 4.

That expression, thus, as required, matches all 2 digit numbers from 00 to 34.

A. Rama
  • 903
  • 8
  • 18
0

\d[1-34] actually matches a digit, followed by a digit that's in the range 1-3, or an "4".

  • "34" was matched because \d matched 3, and then the 4 inside the character class matched the 4.

  • "12" was matched because, again, \d\ matched 1, and then 2 was matched because it's in the range 1-3.

As mentioned in the comments, a better solution would be matching all 2-digit numbers and verify the range manually:

>>> re.findall("(\d\d)","hi34hi30hi12")
['34', '30', '12']

Now iterate over the list and verify the range.

Maroun
  • 94,125
  • 30
  • 188
  • 241
  • Thanks a lot! How can do my search then?? – Vishal Yarabandi Mar 15 '16 at 15:26
  • Do you mind explaining how the question marks work? – lochsh Mar 15 '16 at 15:28
  • Use `r"(?<![\d.])\d{2}(?![\d.])"` to match all 2-digit numbers, cast to int and check if the number is between the specified range. – Wiktor Stribiżew Mar 15 '16 at 15:29
  • Nope, that doesn't match 29 – A. Rama Mar 15 '16 at 15:29
  • @A.Rama correct. Will fix. – Maroun Mar 15 '16 at 15:30
  • @WiktorStribiżew why not using only `\d{2}`? – Maroun Mar 15 '16 at 15:31
  • @MarounMaroun: I love complicating things :) `\d{2}` will match any sequence of 2 digits. `(?<![\d.])\d{2}(?![\d.])` will match integers that are surrounded with non-digits. – Wiktor Stribiżew Mar 15 '16 at 15:33
  • @WiktorStribiżew I guess I got a downvote for over simplifying :) – Maroun Mar 15 '16 at 15:34
  • I don't agree, the question was about the regex and not a programmatic solution. Your `\d\d\` regex is not a correct solution for the question. – A. Rama Mar 15 '16 at 15:35
  • 1
    @A.Rama: Let me remark that a regex is also a "programmatic" solution. One should know how to use it in the specific code environment. – Wiktor Stribiżew Mar 15 '16 at 15:36
  • @A.Rama OP already got a regex for that. I suggested a better solution (at least for me). I hate using regex for all tasks in the word. For god sake, matching a range shouldn't be validated using a regex. – Maroun Mar 15 '16 at 15:36
  • Why using a regex when you just can parse the string yourself? That's because, when parsing large amount of data, a regex can be quite fast and powerful and expressive. Programmatic solutions or post-analysis are sometimes needed because regexs are not always able to express every possible 'word' that you want to match, but it's definitely not the case here. – A. Rama Mar 15 '16 at 15:41