Python Regexp Dynamic Search

Question

I have written regexp to search all 4 values under host-vpp1out (up): So basically I want the regexp to be dynamic so that it can capture any no of IPv4/IPv6 address under "host-vpp1out (up):"

m = re.findall(r'host-vpp1out.*\n\s+L3\s+(\d[\d.]*)', out)

current output

['1.1.1.1']

expected

['1.1.1.1', '1.1.2.1', '2001:db8:0:1:1:1:1:1', '2001:db8:0:1:1:1:2:1']

out

VirtualFuncEthernet0/7/0.2001 (up):
  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2
  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1
VirtualFuncEthernet0/9/0 (dn):
host-vpp1out (up):
  L3 1.1.1.1/24
  L3 1.1.2.1/24
  L3 2001:db8:0:1:1:1:1:1/112
  L3 2001:db8:0:1:1:1:2:1/112
local0 (dn):
loop0 (up):
  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1

how to find the expected value ?

If all after L3 should start with a digit and end with `/` and 1+ digits, which is a broad match, you could use `\G` and the regex PyPi module `(?:host-vpp1out.*|\G(?!^))\n\s+L3\s+(\d[\d.:\/a-z]*)\/\d+` https://regex101.com/r/GSFUSl/1 — The fourth bird, Jul 27 '20 at 14:23
out is like new line separated with \n eg : VirtualFuncEthernet0/7/0.2001 (up):\n L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2\n L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1\nVirtualFuncEthernet0/9/0 (dn):\nhost-vpp1out (up):\n L3 1.1.1.1/24\n L3 1.1.2.1/24\n L3 2001:db8:0:1:1:1:1:1/112\n L3 2001:db8:0:1:1:1:2:1/112\nlocal0 (dn):\nloop0 (up): — Raj Naik, Jul 27 '20 at 14:36

Wiktor Stribiżew · Answer 1 · 2020-07-28T08:20:12.367

You may read a file up to the line that starts with host-vpp1out (up):, then read all lines below it starting with " L3 ", and save the substrings between the fifth char and / using

text = """VirtualFuncEthernet0/7/0.2001 (up):
  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2
  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1
VirtualFuncEthernet0/9/0 (dn):
host-vpp1out (up):
  L3 1.1.1.1/24
  L3 1.1.2.1/24
  L3 2001:db8:0:1:1:1:1:1/112
  L3 2001:db8:0:1:1:1:2:1/112
local0 (dn):
loop0 (up):
  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1"""

results = []
f = iter(text.splitlines())
for line in f:
    if line.startswith("host-vpp1out (up):"):
        line = next(f)
        while line.startswith("  L3 "):
            results.append(line[5:].split("/")[0])
            line = next(f)
        break
    
print(results)
# => ['1.1.1.1', '1.1.2.1', '2001:db8:0:1:1:1:1:1', '2001:db8:0:1:1:1:2:1']

See the Python demo

The fourth bird · Accepted Answer · 2020-07-28T08:08:34.253

You could make use of the PyPi regex module and the \G anchor and a capturing group which which will be returned by using regex.findall.

(?:host-vpp1out .*|\G(?!^))\n\s+L3\s+(\d[\d.:\/a-z]+)\/\d+

(?: Non capture group
- host-vpp1out.* Match host-vpp1out and the rest of the line
- | Or
- \G(?!^) Assert the position at the previous match, not at the start
) Close non capture group
\n\s+ Match a newline and 1+ whitespace chars
L3\s+ Match L3 and 1+ whitespace chars
( Capture group 1
- \d[\d.:\/a-z]* Match a digit followed by 1+ times any for the listed
) Close group 1
\/\d+ Match / and 1+ digits

Regex demo | Python demo

Note that this part (\d[\d.:\/a-z]+)\/\d+ is a broad match to match an ipv4 or ipv6 pattern. The links contain pages with a more specific pattern.

Example code

import regex

pattern=r"(?:host-vpp1out.*|\G(?!^))\n\s+L3\s+(\d[\d.:\/a-z]*)\/\d+"
test_str = ("VirtualFuncEthernet0/7/0.2001 (up):\n"
    "  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2\n"
    "  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1\n"
    "VirtualFuncEthernet0/9/0 (dn):\n"
    "host-vpp1out (up):\n"
    "  L3 1.1.1.1/24\n"
    "  L3 1.1.2.1/24\n"
    "  L3 2001:db8:0:1:1:1:1:1/112\n"
    "  L3 2001:db8:0:1:1:1:2:1/112\n"
    "local0 (dn):\n"
    "loop0 (up):\n"
    "  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1")

print(regex.findall(pattern, test_str))

Output

['1.1.1.1', '1.1.2.1', '2001:db8:0:1:1:1:1:1', '2001:db8:0:1:1:1:2:1']

Using re instead of regex, you could also do it in 2 steps, first matching host-vpp1out and the L3 lines. Then from that match, you can get the values in group 1 using re.findall.

import re
 
regex=r"^host-vpp1out .*(?:\r?\n[^\S\r\n]*L3 .*)*"
test_str = ("VirtualFuncEthernet0/7/0.2001 (up):\n"
            "  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2\n"
            "  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1\n"
            "VirtualFuncEthernet0/9/0 (dn):\n"
            "host-vpp1out (up):\n"
            "  L3 1.1.1.1/24\n"
            "  L3 1.1.2.1/24\n"
            "  L3 2001:db8:0:1:1:1:1:1/112\n"
            "  L3 2001:db8:0:1:1:1:2:1/112\n"
            "local0 (dn):\n"
            "loop0 (up):\n"
            "  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1")
 
match = re.search(regex, test_str, re.MULTILINE)
 
if match:
    print(re.findall(r" L3 (\d[\d.:\/a-z]+)\/\d+", match.group()))

Python demo

I can't use regex, its server restriction... any workaround... — Raj Naik, Jul 27 '20 at 15:16
@Rajendra You could also use `re` instead of `regex` and do it in 2 steps https://ideone.com/i0oGYw Did you try Wiktor's solution? — The fourth bird, Jul 27 '20 at 15:23
thanks it worked... will check Wiktor's solution as well.... — Raj Naik, Jul 28 '20 at 07:59

Python Regexp Dynamic Search

out

2 Answers2