How can we print only two digit number from the given lines in python using regex

Question

a = "Hi I am sony. My age 23. This is my email id sony.1510@gmail.com . Hi I am Jessey. My age 20. This is my email id jessey.1996@gmail.com . Hi I am ronald.My age 17 My mail id is ronald.1999@gmail.com"

>>> re.findall(r'[0-9]{2}', a)
['23', '15', '10', '20', '19', '96', '17', '19', '99']

But,I need only the given output:

['23', '20', '17']

Mustofa Rizwan · Accepted Answer · 2017-03-06T13:13:19.657

0

You can try this:

(?<=\b)\d+(?=[\s.,;])

Explanation:

(?<=\b) Looks behind from the current position if it is a start of a word
\d+ If previous condition is found true then matches digits \d+
(?=[\s.,;]) If previous condition met, then look ahead to see if it ends with space or dot or comma or semicolon

demo

if I put word boundary after digits then 1999@ will also get selected. Shown in the demo

Sample Code:

import re
regex = r"(?<=\b)\d+(?=[\s.,;])"
test_str = "Hi I am sony. My age 23. This is my email id sony.1510@gmail.com . Hi I am Jessey. My age 20. This is my email id jessey.1996@gmail.com . Hi I am ronald.My age 17 My mail id is ronald.1999@gmail.com"

matches = re.finditer(regex, test_str, re.MULTILINE)

for match in matches:
    print(match.group(0))

Run it here

UPDATE

\b ensures word boundary. I have put (?<=\b) at the beginning which is similar as \b. So the regex above is same as :

\b\d+(?=[\s.,;])

Here, The first \b makes sure that any matching will only start from a start of a word. Where start of a word means just after .(dot) ,(comma), ;(semicolon),space,tab, etc

edited Mar 06 '17 at 13:13

answered Mar 06 '17 at 12:34

Mustofa Rizwan

10,215
2
28
43

Why do you put the word boundary in a lookbehind? And why not using a wordboundary after the digits? – Toto Mar 06 '17 at 12:37
@Toto I have mentioned the reason , in the answer now – Mustofa Rizwan Mar 06 '17 at 12:39
`\b(\d{2})\b` is enough as OP wants only 2 digit numbers. And, in fact, wordboundary IS a lookarround. – Toto Mar 06 '17 at 12:47
I think the op thought about 2 digit , in a quest to get the numbers mentioned in the sample. By instinct, I think, the op won't mind if somebody ages 101 is selected as well – Mustofa Rizwan Mar 06 '17 at 12:49
@RizwanM.Tuman: Can you please explain each line. So, that I can understand the code easily. – Surya G Mar 06 '17 at 12:49
@SuryaKumari watch my updated answer – Mustofa Rizwan Mar 06 '17 at 12:52
@Toto if I use "\b(\d+)\b" instead of \d{2} then .1999@ also gets selected ... so \b at the end is not doing what you have said (would like to know why though) and it is true that I could have used just a \b at the beginning , look behind wasn't needed as it is , itself a lookaround – Mustofa Rizwan Mar 06 '17 at 12:55
@RizwanM.Tuman explanation is clear. But, I want to know about the \b which you have used for boundaries. Can you please explain how it works. I didn't get the concept of \b – Surya G Mar 06 '17 at 13:02
@SuryaKumari I have explained it further ... please have a look moreover you can learn more about regex in www.regex101.com here in the right top you usually get explanation of your written regex – Mustofa Rizwan Mar 06 '17 at 13:08
@RizwanM.Tuman thank you so much for your inputs. It helped me a lot. – Surya G Mar 06 '17 at 13:13

How can we print only two digit number from the given lines in python using regex

1 Answers1