1

I have str1 and str2 below, and I want to use just one regexp which will match both. In case of str1, I also want to be able to capture the number of QSFP ports

>>> str1='''4 48 48-port and 6 QSFP 10GigE Linecard 7548S-LC''' 
>>> str2='''4 48 48-port 10GigE Linecard 7548S-LC''' 
>>> 

I want to be able to capture the numbers "4", "48", "6" (if present), and "7548". But I am unable to capture "6" using the "?" metacharacter.

When I do not use a metacharacter, the capture works for str1, but then I can use this regex because it wont work for str2:

>>> re.search(r'^(\d+)\s+(\d+)\s+.*(?:(\d+)\s+QSFP).*\s+(\d+)S-LC', str1, re.I|re.M).group(3) 
'6' 
>>>

It works even when I use the "+" to indicate one occurrence, but again, this wont work for str2:

>>> re.search(r'^(\d+)\s+(\d+)\s+.*(?:(\d+)\s+QSFP)+.*\s+(\d+)S-LC', str1, re.I|re.M).group(3) 
'6' 
>>>

When I use "?" to match for 0 or 1 occurrence, the capture fails even for str1:

>>> re.search(r'^(\d+)\s+(\d+)\s+.*(?:(\d+)\s+QSFP)?.*\s+(\d+)S-LC', str1, re.I|re.M).group(3) 
>>>
Sanjay BM
  • 11
  • 2
  • 1
    Combining quantifiers like that is meaningless ("0 or more copies of 1 or more copies of `\d`"). What are you really trying to do? – geekosaur Apr 15 '12 at 04:38
  • I think `re.search(r'(\d+)', str1).group(0)` is sufficient. – RanRag Apr 15 '12 at 04:39
  • I want to capture the decimal value IF it appears in 'str1', but I do not want the regex to fail if there is no decimal value. So what I really want is for this below to capture the decimal value: re.search(r'(\d+)?', str1).group(1) – Sanjay BM Apr 15 '12 at 04:42
  • Leftmost matching means that \d* (and similar quantifiers that permit empty-string matches) will never find the digits unless they're at the beginning of the string. Your best bet is to use conditional logic outside of the regular expression. – Mark Reed Apr 15 '12 at 04:59
  • 1
    I'm confused. You state you 'also want to capture the number of QSFP ports', but according to your example regexes, are you catching anything **other than** the QSFP value? – hexparrot Apr 15 '12 at 05:35
  • I had simplified by regex in my question by hard coding the numbers '4' and '48' at the beginning of the string. Those are other values that I am actually capturing – Sanjay BM Apr 15 '12 at 05:46
  • Update your post to state something along those lines then, such as "I wish to capture the '4', '48', '48', '6' (if present), '10', '74853' using one regex. Or omit some as per your needs, such as "I only need 4,48, and 6 (if present) – hexparrot Apr 15 '12 at 05:52
  • using triple quotes here is unnessisary, see http://stackoverflow.com/questions/1520548/how-does-pythons-triple-quote-string-work and the [python documentation](http://docs.python.org/tutorial/introduction.html#strings) – Shep Apr 15 '12 at 06:17

4 Answers4

2

My interpretation of the problem was that OP wanted a regex that will match for both strings, and return the number in .group(1) if it exists (as it does in str1). I believe the issue was that he/she was not able to both capture the '6' in str1 and also match str2.

I got this from some quick trial and error:

>>> str1='''4 48 48-port and 6 QSFP 10GigE Linecard 7548S-LC''' 
>>> str2='''4 48 48-port 10GigE Linecard 7548S-LC''' 
>>> re.search(r'^4\s+48\s+.*(?:(\d+)\s+QSFP)|.*-LC', str1, re.I|re.M).group(1)
'6'
>>> re.search(r'^4\s+48\s+.*(?:(\d+)\s+QSFP)|.*-LC', str2, re.I|re.M).group(1)
>>> # no error returned, implying a match was found.

The difference is that I "or" the non-capturing parens with .*

Unfortunately, this makes the regex even more difficult to understand, but maybe it will work for you.

(edited for completeness)

ericcccc
  • 152
  • 1
  • 5
1

I am not sure exactly what your requirement is.

Is it something like this:

>>> str1 = "hello 12 world"
>>> str2 =  "hello world"
>>> obj = re.search(r'(\d+)',str1)
>>> obj.group(0)
'12'

Now checking in str2 which is not containing any decimal value.

>>> obj = re.search(r'(\d+)',str2)
>>> if obj is not None:
...     print obj.group(0)
... else:
...     print "not found"
...
not found
>>>
RanRag
  • 48,359
  • 38
  • 114
  • 167
  • I was trying to simplify my question and now I see the confusion it could create. This is what I need: I have str1 and str2 below, and I want to use just one regexp which will match both. For str1, I want to capture number of QSFP ports >>> str1='''4 48 48-port and 6 QSFP 10GigE Linecard 7548S-LC''' >>> str2='''4 48 48-port 10GigE Linecard 7548S-LC''' >>> >>> re.search(r'^4\s+48\s+.*(?:(\d+)\s+QSFP).*-LC', str1, re.I|re.M).group(1) '6' >>> >>> re.search(r'^4\s+48\s+.*(?:(\d+)\s+QSFP)?.*-LC', str1, re.I|re.M).group(1) >>> – Sanjay BM Apr 15 '12 at 05:06
  • @user1334085: please include this comment in your original question. – RanRag Apr 15 '12 at 05:08
  • @user1334085: and what is the desired output for str2 – RanRag Apr 15 '12 at 05:19
1

I want to be able to capture the numbers "4", "48", "6" (if present), and "7548". But I am unable to capture "6" using the "?" metacharacter.

You can simplify your life if you avoid regex, since your query is very simple.

str1='''4 48 48-port and 6 QSFP 10GigE Linecard 7548S-LC'''
str2='''4 48 48-port 10GigE Linecard 7548S-LC'''
lines = [str1,str2]
nums = []
for l in lines:
    r = []
    bits = l.split()
    last_num = bits.pop()[:-4]
    _ = [r.append(i) for i in bits if i.isdigit()]
    r.append(last_num)
    nums.append(r)

>>> nums
[['4', '48', '6', '7548'], ['4', '48', '7548']]
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
0

I think the problem is that the .* is eating up the QSFP bit, and because of the ? there's no incentive for it to ever backtrack. Changing the .* to a non-greedy .*? (surprisingly -- to me at least) didn't help. Moving the .* inside the non-capturing group does help, however:

>>> re.match(r'^4\s+48\s+(?:.*(\d+)\s+QSFP)?.*-LC', str1, re.I|re.M).group(1)
'6'
>>> re.match(r'^4\s+48\s+(?:.*(\d+)\s+QSFP)?.*-LC', str2, re.I|re.M).group(1)
>>> 
Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • Cool!!! That solved my problem. I had initially tried .*? to make it non-greedy and it had not worked to my surprise too. But your solution works perfectly. Thanks. – Sanjay BM Apr 15 '12 at 06:17