2

I am new to the regex world. I am trying to make a regex that is able to delimit a number from a number that is followed by a certain string.

Example:

input: '1002 900 600 700 800 234 Andrew'
output: single_numbers: 1002, 900, 600, 700, 800
        special_number: 234

input: '55 89Andrew 78 622 11 22 33 44 55' 
output: single_numbers: 55 78 622 11 22 33 44 55
        special_number: 89

I found this regex that is able to extract the numbers : "[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", but it gives all available numbers: Example:

s = '89Andrew 78 622'
re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s)

output: ['89', '78', '622']

How to make it work?

Thanks !!!

Andrew Tulip
  • 161
  • 6

1 Answers1

1

You can use two regexps to get these different values:

rf'{num_rx}(?=\s*Andrew\b)'
rf'{num_rx}\b(?!\s*Andrew\b)'

Here, (?=\s*Andrew\b) is a positive lookahead that requires zero or more whitespace and then a whole word Andrew (\b is a word boundary), and \b(?!\s*Andrew\b) makes sure there is a word boundary matched first (after a number) and then it fails the match if there is zero or more whitespace and then a whole word Andrew immediately to the right of that location.

See the Python demo:

import re
s = '89Andrew 78 622'
num_rx = r'[-+]?\.?\d+(?:,\d{3})*\.?\d*(?:[eE][-+]?\d+)?'
special_match = re.search(rf'{num_rx}(?=\s*Andrew\b)', s)
if special_match:
    print( special_match.group() )
print(re.findall(rf'{num_rx}\b(?!\s*Andrew\b)', s))

Output:

89
['78', '622']

NOTE: I did not modify your number matching regex, but you might want to see other existing options at Parsing scientific notation sensibly?.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563