11

I am trying to detect all integers and whole numbers (among a lot of other things) from a string. Here are the regular expressions I am currently using:

Whole numbers: r"[0-9]+"

Integers: r"[+,-]?[0-9]+"

Here are the issues:

  1. The whole numbers regex is detecting negative numbers as well, which I cannot have. How do I solve this? If I use a space before at start of the regex I get only positive numbers, but then I get a space at the start of my output!
  2. For whole numbers, I would like to detect positive numbers with the format +[0-9] but store them without the sign.
  3. For integers, I would like to store any positive integer detected with the sign, irrespective if it is present in the original string.

Almost done now: One last thing, I have a string that says "Add 10 and -15". I want to store the integers in a list. I do so using the findall(). While storing the numbers is it possible to store '10' as '+10'

Sahil Thapar
  • 301
  • 1
  • 3
  • 18

2 Answers2

30

For positive integers, use

r"(?<![-.])\b[0-9]+\b(?!\.[0-9])"

Explanation:

(?<![-.])   # Assert that the previous character isn't a minus sign or a dot.
\b          # Anchor the match to the start of a number.
[0-9]+      # Match a number.
\b          # Anchor the match to the end of the number.
(?!\.[0-9]) # Assert that no decimal part follows.

For signed/unsigned integers, use

r"[+-]?(?<!\.)\b[0-9]+\b(?!\.[0-9])"

The word boundaries \b are crucial to make sure that the entire number is matched.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Thanks a lot. The boundaries were extremely helpful. – Sahil Thapar May 27 '13 at 13:49
  • 1
    Almost done now: One last thing, I have a string that says "Add 10 and -15". I want to store the integers in a list. I do so using the findall(). While storing the numbers is it possible to store '10' as '+10' – Sahil Thapar May 27 '13 at 13:52
  • As I mentioned in my comment to your question, that is impossible. Regexes can only match text that's already there, it can't add anything to the match. You'll need to do this programmatically. – Tim Pietzcker May 27 '13 at 14:02
  • 1
    This fails on something like `10.23`. I think you need to add the `.` check to the beginning too. – Jeff Tratner May 27 '13 at 14:04
  • Checking for a single dot after the digits is problematic, isn't it? "Add 10 and 12. Then multiply the result with 17." – Daniel Fischer May 27 '13 at 14:49
  • @DanielFischer: You're right - one could change the lookahead assertion to `(?!\.[0-9])` to allow these cases. – Tim Pietzcker May 27 '13 at 15:13
4

You almost had it.

import re

regex = re.compile(r'(\d+)|([\+-]?\d+)')

s = "1 2 3 4 5 6 +1 +2 +3 -1 -2 -3 +654 -789 321"
for r in regex.findall(s):
    if r[0]:
        # whole (unsigned)
        print 'whole', r[0]
    elif r[1]:
        # a signed integer
        print 'signed', r[1]

Results:

>>> 
whole 1
whole 2
whole 3
whole 4
whole 5
whole 6
signed +1
signed +2
signed +3
signed -1
signed -2
signed -3
signed +654
signed -789
whole 321

Or, you could use "or" to get the actual result in a "nicer" way:

print [r[0] or r[1] for r in regex.findall(s)]
>>> 
['1', '2', '3', '4', '5', '6', '+1', '+2', '+3', '-1', '-2', '-3', '+654', '-789', '321']

Edit: As per your question " is it possible to store '10' as '+10' " :

import re

def _sign(num):
    if r[0]:
        return '+%s'%r[0]
    else:
        return r[1]

regex = re.compile(r'(\d+)|([\+-]?\d+)')
s = "1 2 3 4 5 6 +1 +2 +3 -1 -2 -3 +654 -789 321"      
print [_sign(r) for r in regex.findall(s)]
>>>
['+1', '+2', '+3', '+4', '+5', '+6', '+1', '+2', '+3', '-1', '-2', '-3', '+654', '-789', '+321']

Or in 1-line:

print ['+%s'%r[0] if r[0] else r[1] for r in regex.findall(s)]
>>> 
['+1', '+2', '+3', '+4', '+5', '+6', '+1', '+2', '+3', '-1', '-2', '-3', '+654', '-789', '+321']
Community
  • 1
  • 1
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
  • Almost done now: One last thing, I have a string that says "Add 10 and -15". I want to store the integers in a list. I do so using the findall(). While storing the numbers is it possible to store '10' as '+10' – Sahil Thapar May 27 '13 at 13:52
  • Thanks I just needed to know if it could be done while extracting the regex and without using an explicit function. Thanks anyways! – Sahil Thapar May 28 '13 at 08:26
  • 2
    You can see the last line of my answer is all in one line. – Inbar Rose May 28 '13 at 09:02