3

I have a situation where I have to match a pattern only when previous regex pattern matches. Both pattern are different and matchobj in different line. For exmaple,

Text:

blah blah blah MyHost="xxxx"
again blah blah blah MyIp= "x.x.x.x"

I am only interested in whats comes after MyHost and MyIp, I also have a requirement that MyIp should match only when there is a match(MyHost="xxxx") in the above line.

I am able to match both MyHost value and MyIp value separately but having hard time finding a logic to match both as per the requirement. Please note I am fairly new to python and tried lot of search and end up here.

user2864740
  • 60,010
  • 15
  • 145
  • 220
Sarvin
  • 33
  • 3

5 Answers5

1

MyIp should match only when there is a match(MyHost="xxxx") in the above line.

Get the matched group from index 1 in Lazy way. You know already what is next after MyHost

\bMyHost="xxxx"\r?\n.*?MyIp=\s*\"([^"]*)

Here is demo

sample code:

import re
p = re.compile(ur'\bMyHost="xxxx"\r?\n.*?MyIp=\s*\"([^"]*)', re.IGNORECASE)
test_str = u"blah blah blah MyHost=\"xxxx\"\nagain blah blah blah MyIp= \"x.x.x.x\""

re.findall(p, test_str)
Braj
  • 46,415
  • 5
  • 60
  • 76
1

You could do this through regex module.

>>> import regex
>>> s = '''blah blah blah MyHost="xxxx"
... foo bar
... again blah blah blah MyIp= "x.x.x.x"
... 
... blah blah blah MyHost="xxxx"
... again blah blah blah MyIp= "x.x.x.x"'''
>>> m = regex.search(r'(?<=MyHost="xxxx"[^\n]*\n.*?MyIp=\s*")[^"]*', s)
>>> m.group()
'x.x.x.x'

This would match the value of MyIp only if the string MyHost="xxxx" present on the previous line.

If you want to list the both, then try the below code.

>>> m = regex.findall(r'(?<=(MyHost="[^"]*")[^\n]*\n.*?)(MyIp=\s*"[^"]*")', s)
>>> m
[('MyHost="xxxx"', 'MyIp= "x.x.x.x"')]
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0
       (?=.*? MyHost=\"xxx\" .*) .*? MyIp=\"(\S+)\" .*

The xxx can be changed as required.MyIP will get captured.

You can use python lookahead.Only when xxx matches regex will go ahead and fetch IP

         (?=regex)regex1

match regex1 only when regex has matched.

vks
  • 67,027
  • 10
  • 91
  • 124
0

You should take advantage of short circuiting, I believe python supports it. In short circuiting, the second condition will only be evaluated if the first one is true (for AND operations). So your code will look like the following:

 patternMatch1(MyHost) and patternMatch2(MyIp)

Here you could have both the pattern match functions return true if they are appropriately matched to.

Please let me know if you have any questions!

Community
  • 1
  • 1
Devarsh Desai
  • 5,984
  • 3
  • 19
  • 21
  • In Python `&&` is a bitwise operator and short-circuiting is not applied (though it is with the `and` operator). But this requires two separate scans of the text, which is wasteful. – holdenweb Aug 04 '14 at 06:02
  • hey @holdenweb; good catch and thank you for the clarification on && and "and"; I sincerely appreciate it! I was wondering if you could explain why short circuiting requires 2 separate scans? I've implemented this feature in a C-style language and didn't come across that issue, and I've never heard the reason for not using it because it is wasteful? – Devarsh Desai Aug 04 '14 at 16:27
  • The point is that whether you use `and` or `&` (I don't believe there IS an `&&` operation in Python) you are using two different pattern matches on the same string. You are correct that in the case of `and`, if the first match fails the second one won't be attempted. But if it succeeds then you start at the beginning of the string. The OP suggests that the two matches should be on successive lines, so you'd either have to extract the required starting point from the first match or somehow correlate the two. Much easier to use a single pattern that matches all required components. – holdenweb Aug 05 '14 at 17:01
  • hey @holdenweb; ahh i see! Thank you for the clarification. I was under the assumption that both the strings would have been parsed before hand, but yes-what you're saying is completely correct! Thank you for the advice & clarification and I was happy to see the better solutions provided on this thread! :0) – Devarsh Desai Aug 05 '14 at 17:26
0

Generally if you want to use Regex , you'll need to match "MyHost" and all that follows and "MyIP" and that follows it to the end of the line

So basically what you want to do is write a regex similar to this one

MyHost="\w+"

This will match MyHost=" " and the input between it will be set to W afterwards you can retrieve the value of W and do the computation you need

To solve the problem where you have to match The host first a simple if Condition can solve this problem by checking the Host name first before the Ip

Hesham Amer
  • 122
  • 1
  • 13
  • You may need to change the Regex to MyHost="(\w+)" in order to retrieve it, you'll have to look up the exact correct syntax – Hesham Amer Aug 04 '14 at 06:05