-1

I must be missing something simple - whatever I do, I can't get my regex to match any strings:

[~] $ python2.7
Python 2.7.12 (default, Aug 13 2016, 19:37:25) 
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = "   405489796130    "
>>> regex = "\b[0-9]{15}|[0-9]{12}\b"
>>> for str in re.findall(regex, s):
...     print(str)
... 
>>> for str in re.finditer(regex, s):
...     print(str)
... 
>>> print("Hi")
Hi
>>> 

The regex "\b[0-9]{15}|[0-9]{12}\b" should definitiely match the provided string (that string contains a substring of 12 digits...).

I even put this text and the regex into https://regexr.com/ and that website's regex found the substring - why can't Python?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
CaptainForge
  • 1,365
  • 7
  • 21
  • 46
  • I'm hardly a regex expert; I consult the [documentation](https://docs.python.org/2/howto/regex.html). – Prune Oct 26 '17 at 22:25
  • check out `print("\b")` ... note, backslash is an *escape sequence* in Python. use *raw strings* so Python doesn't interpret escape sequences: `print(r"\b")` or use double-escapes: `print("\\b")` – juanpa.arrivillaga Oct 26 '17 at 22:41
  • Please see my answer for a correct solution, mypetlion's one is not doing the right thing. I explained why. – Wiktor Stribiżew Oct 27 '17 at 18:03

2 Answers2

2

You have to escape your back-slashes.

regex = "\\b[0-9]{15}|[0-9]{12}\\b"
mypetlion
  • 2,415
  • 5
  • 18
  • 22
  • 4
    Use r-strings. This is why you'll see almost all regex examples in Python using them (except in cases where no back-slashes are used. `regex = r"\b[0-9]{15}|[0-9]{12}\b"` – Iguananaut Oct 26 '17 at 22:27
  • I always use \\. I think that's just a preference thing where I'm in the minority. – mypetlion Oct 26 '17 at 22:30
  • 1
    Seconding the use of `r`. This is what it's made for, and it's specifically suggested as the solution to this problem in Python's documentation. Why in the world would you prefer double backslashes over a simple switch at the beginning? – CAustin Oct 26 '17 at 22:32
2

The regex patterns needs to be a raw string, in Python you declare a raw string with an r prefix like so:

import re
s = "   405489796130    "
regex = r"\b[0-9]{15}|[0-9]{12}\b"
for match in re.findall(regex, s):
    print(match)

>>> 405489796130

EDIT

[Deleted additional guidance]

Cole
  • 1,715
  • 13
  • 23
  • It is not a good idea to copy others' solutions to your own answer. If you see other's answer is correct and yours not, you should remove it and upvote the correct answer. – Wiktor Stribiżew Oct 27 '17 at 18:01
  • @WiktorStribiżew I tried to comment that it works without the corrected syntax on your answer but SO requires 50+ reputation to comment. I did it correctly given the restraints – Cole Oct 27 '17 at 20:47
  • Correct way is to post your own good answers to gain 50 points (and more). Please remove my solution from your answer. – Wiktor Stribiżew Oct 27 '17 at 20:49
  • @WiktorStribiżew The corrected syntax doesn’t answer the question since it works without it. It is not a solution to the given question, it’s simply additional guidance. – Cole Oct 27 '17 at 20:52
  • [Here is your regex demo](https://regex101.com/r/tWBy0t/2), it matches really any strings containing 12 digits at the start or 15 digits at the end of the word – Wiktor Stribiżew Oct 27 '17 at 20:57
  • @WiktorStribiżew Was that the given string in the question? I understand that it works better for use cases that he didn’t give but it wasn’t the question, it’s simply additional guidance. I’ll remove it from my answer to appease you – Cole Oct 27 '17 at 21:04