0

In python I'm trying to grab multiple inputs from string using regular expression; however, I'm having trouble. For the string:

inputs       =    12 1  345 543 2

I tried using:

match = re.match(r'\s*inputs\s*=(\s*\d+)+',string)

However, this only returns the value '2'. I'm trying to capture all the values '12','1','345','543','2' but not sure how to do this.

Any help is greatly appreciated!

EDIT: Thank you all for explaining why this is does not work and providing alternative suggestions. Sorry if this is a repeat question.

user8675309
  • 181
  • 2
  • 10
  • possible duplicate of [Regex question about parsing method signature](http://stackoverflow.com/questions/4493844/regex-question-about-parsing-method-signature) – Martijn Pieters May 28 '13 at 14:32
  • You are facing the same problem as the linked question; your `(...)` group can only match *once*. Combine matching with splitting. – Martijn Pieters May 28 '13 at 14:33

4 Answers4

2

You could try something like: re.findall("\d+", your_string).

mohit6up
  • 4,088
  • 3
  • 17
  • 12
1

You cannot do this with a single regex (unless you were using .NET), because each capturing group will only ever return one result even if it is repeated (the last one in the case of Python).

Since variable length lookbehinds are also not possible (in which case you could do (?<=inputs.*=.*)\d+), you will have to separate this into two steps:

match = re.match(r'\s*inputs\s*=\s*(\d+(?:\s*\d+)+)', string)
integers = re.split(r'\s+',match.group(1))

So now you capture the entire list of integers (and the spaces between them), and then you split that capture at the spaces.

The second step could also be done using findall:

integers = re.findall(r'\d+',match.group(1))

The results are identical.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
1

You can embed your regular expression:

import re
s = 'inputs       =    12 1  345 543 2'
print re.findall(r'(\d+)', re.match(r'inputs\s*=\s*([\s\d]+)', s).group(1))
>>> 
['12', '1', '345', '543', '2']

Or do it in layers:

import re

def get_inputs(s, regex=r'inputs\s*=\s*([\s\d]+)'):
    match = re.match(regex, s)
    if not match:
        return False # or raise an exception - whatever you want
    else:
        return re.findall(r'(\d+)', match.group(1))

s = 'inputs       =    12 1  345 543 2'
print get_inputs(s)
>>> 
['12', '1', '345', '543', '2']
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
0

You should look at this answer: https://stackoverflow.com/a/4651893/1129561

In short:

In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).

Community
  • 1
  • 1
Lllama
  • 374
  • 1
  • 9