3

I need help with a regex for multi line skip until pattern and couldn't see it already covered.

Name of person 
 Jack
 Nichol 
 Age 42
 .....
 .....
 ....
Name of person
 Andrew
 Jason
 Age 54
...

... ...

How do i match - something like (Name.*(?:(\n)+).*(?:Age))

consider the below -

interface TenGigE0/0/0/7



shutdown

!

interface TenGigE0/0/0/8



 bundle id 221 mode active

 lacp period short

 lacp period short receive 100

 lacp period short transmit 100

 carrier-delay up 100 down 100

 load-interval 30

 frequency synchronization

 !

 transceiver permit pid all

!

interface TenGigE0/0/0/9



 mtu 9216

 frequency synchronization

 !

 transceiver permit pid all

!

interface TenGigE0/0/0/10



 bundle id 237 mode active

 lacp period short

 lacp period short receive 100

 lacp period short transmit 100

 carrier-delay up 120000 down 150

 load-interval 30

 frequency synchronization

how do i match all the tengigex/x/x/x and corresponding carrier-delay lines.

like below -

[ interface TenGigE0/0/0/8, carrier-delay up 100 down 100] [ interface TenGigE0/0/0/10, carrier-delay up 120000 down 150] ...and so on.

2 Answers2

2

To match the contents between the closest lines containing tengige and carrier-delay you need a tempered greedy token (or an unrolled version):

(?sim)^([^\n]*TenGigE[^\n]*)(?:(?!TenGigE|carrier-delay).)*([^\n]*carrier-dela‌​y[^\n]*)

See the regex demo

See the Python demo:

import re
p = re.compile(r'^([^\n]*TenGigE[^\n]*)(?:(?!TenGigE|carrier-delay).)*([^\n]*carrier-delay[^\n]*)', re.DOTALL | re.M | re.I)
test_str = "interface TenGigE0/0/0/8\n bundle id 221 mode active\n lacp period short\n lacp period short receive 100\n lacp period short transmit 100\n carrier-delay up 100 down 100\n\ninterface TenGigE0/0/0/7\n\n\n\nshutdown\n\n!\n\ninterface TenGigE0/0/0/8\n\n\n\n bundle id 221 mode active\n\n lacp period short\n\n lacp period short receive 100\n\n lacp period short transmit 100\n\n carrier-delay up 100 down 100\n\n load-interval 30\n\n frequency synchronization\n\n !\n\n transceiver permit pid all\n\n!\n\ninterface TenGigE0/0/0/9\n\n\n\n mtu 9216\n\n frequency synchronization\n\n !\n\n transceiver permit pid all\n\n!\n\ninterface TenGigE0/0/0/10\n\n\n\n bundle id 237 mode active\n\n lacp period short\n\n lacp period short receive 100\n\n lacp period short transmit 100\n\n carrier-delay up 120000 down 150\n\n load-interval 30\n\n frequency synchronization"
print(p.findall(test_str))
# => [('interface TenGigE0/0/0/8', 'carrier-delay up 100 down 100'), ('interface TenGigE0/0/0/8', 'carrier-delay up 100 down 100'), ('interface TenGigE0/0/0/10', 'carrier-delay up 120000 down 150')]

UPDATE

A very powerful regex for extracting the same texts based on the unroll the loop technique (unrolled tempered greedy token):

(?sim)^([^\n]*TenGigE[^\n]*\n)[^T\n]*(?:T(?!enGigE)[^T\n]*|\n(?! carrier-delay)[^T\n]*)*(\n carrier-delay[^\n]*)

See the regex demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Please check this and let know if that is all you need or if you need to capture anything else inside the block. – Wiktor Stribiżew May 04 '16 at 09:53
  • Thanks It's close. But need _only_ - [interface TenGigE0/0/0/8, carrier-delay up 100 down 100] say in a list and not the lacp or description or any other stuff in between them. Accepting the answer. But refinement will be appreciated. – user6259926 May 04 '16 at 10:03
  • I have updated the answer with the regex and code. Glad it worked for you. Please also consider upvoting if my answer proved helpful to you (see [How to upvote on Stack Overflow?](http://meta.stackexchange.com/questions/173399/how-to-upvote-on-stack-overflow)). – Wiktor Stribiżew May 04 '16 at 10:10
  • Thanks. Accepted the other answer and upvoted this one. – user6259926 May 04 '16 at 10:31
  • Your demo needs more than 200.000 steps to succeed! – Jan May 04 '16 at 11:17
  • Then it can be reduced 10 times with [`(?sim)^([^\n]*TenGigE[^\n]*)(?:(?!TenGigE|carrier-delay).)*([^\n]*carrier-delay[^\n]*)`](https://regex101.com/r/yG6pI8/4). – Wiktor Stribiżew May 04 '16 at 11:23
0

You could come up with:

(?:^(interface\ TenGigE
(?:\d+/?){4}))
(?:(?!(?:carrier-delay|interface))[\s\S])+
(?P<carrier>carrier-delay\ .+)

In Python this would be:

import re
rx = re.compile("""
(?:^(interface\ TenGigE
(?:\d+/?){4}))
(?:(?!(?:carrier-delay|interface))[\s\S])+
(?P<carrier>carrier-delay\ .+)""", re.VERBOSE|re.MULTILINE)
matches = rx.findall(string)

Compared to @Wiktor's answer (which needs > 200k steps), this one only needs ~3k, see a demo on regex101.com (thanks to him for spotting an inaccuracy before).

Jan
  • 42,290
  • 8
  • 54
  • 79
  • `\G` is not the point here. You yourself are using a tempered greedy token - that is the point. No need using the regex module here. – Wiktor Stribiżew May 04 '16 at 11:26
  • @WiktorStribiżew: Right you are, updated the answer. I have started with `\G` (had another solution on my mind). Still, why do you need more then 200.000 steps to succeed? – Jan May 04 '16 at 11:34
  • 1
    I have 20K only because OP did not confirm that the lines should start with *interface* and *carrier-delay*. The task was to find lines *containing* them. I can further enhance the performance by adding an anchor or/and unrolling the pattern, sure. – Wiktor Stribiżew May 04 '16 at 11:37
  • @WiktorStribiżew: Damn it, you got me :) – Jan May 04 '16 at 11:59
  • :-) regexes really get everyone bugged. 2 confessions. the input data i provided was trimmed a bit - had to edit out the private stuff about interface descriptions. if matching only tengig followed by carrier-delay the scope is huge. eg. it finds a thus giving inaccurate results. i have constrained it further. paste the regex for the record. (tengigE.*(?:\r|\n)\s.des.*?)\n\n(?:.|\n)+?(Car.*?)\n gmi – user6259926 May 05 '16 at 07:07
  • theres also another thing wish to add - despite it working perfectly fine in regex101 the regex didnt work on windows when i read the file into a buffer and ran the re.findall on it. so if working on windows - read the file into a buffer and print it to screen. paste that output to regex101 and build your regex. – user6259926 May 05 '16 at 10:10