1

I have written regexp to search all 4 values under host-vpp1out (up): So basically I want the regexp to be dynamic so that it can capture any no of IPv4/IPv6 address under "host-vpp1out (up):"

m = re.findall(r'host-vpp1out.*\n\s+L3\s+(\d[\d.]*)', out)

current output

['1.1.1.1']

expected

['1.1.1.1', '1.1.2.1', '2001:db8:0:1:1:1:1:1', '2001:db8:0:1:1:1:2:1']

out

VirtualFuncEthernet0/7/0.2001 (up):
  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2
  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1
VirtualFuncEthernet0/9/0 (dn):
host-vpp1out (up):
  L3 1.1.1.1/24
  L3 1.1.2.1/24
  L3 2001:db8:0:1:1:1:1:1/112
  L3 2001:db8:0:1:1:1:2:1/112
local0 (dn):
loop0 (up):
  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1

how to find the expected value ?

Raj Naik
  • 87
  • 1
  • 2
  • 9
  • If all after L3 should start with a digit and end with `/` and 1+ digits, which is a broad match, you could use `\G` and the regex PyPi module `(?:host-vpp1out.*|\G(?!^))\n\s+L3\s+(\d[\d.:\/a-z]*)\/\d+` https://regex101.com/r/GSFUSl/1 – The fourth bird Jul 27 '20 at 14:23
  • Are you reading this from a file? Is `out` file contents? – Wiktor Stribiżew Jul 27 '20 at 14:32
  • out is like new line separated with \n eg : VirtualFuncEthernet0/7/0.2001 (up):\n L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2\n L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1\nVirtualFuncEthernet0/9/0 (dn):\nhost-vpp1out (up):\n L3 1.1.1.1/24\n L3 1.1.2.1/24\n L3 2001:db8:0:1:1:1:1:1/112\n L3 2001:db8:0:1:1:1:2:1/112\nlocal0 (dn):\nloop0 (up): – Raj Naik Jul 27 '20 at 14:36

2 Answers2

1

You may read a file up to the line that starts with host-vpp1out (up):, then read all lines below it starting with " L3 ", and save the substrings between the fifth char and / using

text = """VirtualFuncEthernet0/7/0.2001 (up):
  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2
  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1
VirtualFuncEthernet0/9/0 (dn):
host-vpp1out (up):
  L3 1.1.1.1/24
  L3 1.1.2.1/24
  L3 2001:db8:0:1:1:1:1:1/112
  L3 2001:db8:0:1:1:1:2:1/112
local0 (dn):
loop0 (up):
  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1"""

results = []
f = iter(text.splitlines())
for line in f:
    if line.startswith("host-vpp1out (up):"):
        line = next(f)
        while line.startswith("  L3 "):
            results.append(line[5:].split("/")[0])
            line = next(f)
        break
    
print(results)
# => ['1.1.1.1', '1.1.2.1', '2001:db8:0:1:1:1:1:1', '2001:db8:0:1:1:1:2:1']

See the Python demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

You could make use of the PyPi regex module and the \G anchor and a capturing group which which will be returned by using regex.findall.

(?:host-vpp1out .*|\G(?!^))\n\s+L3\s+(\d[\d.:\/a-z]+)\/\d+
  • (?: Non capture group
    • host-vpp1out.* Match host-vpp1out and the rest of the line
    • | Or
    • \G(?!^) Assert the position at the previous match, not at the start
  • ) Close non capture group
  • \n\s+ Match a newline and 1+ whitespace chars
  • L3\s+ Match L3 and 1+ whitespace chars
  • ( Capture group 1
    • \d[\d.:\/a-z]* Match a digit followed by 1+ times any for the listed
  • ) Close group 1
  • \/\d+ Match / and 1+ digits

Regex demo | Python demo

Note that this part (\d[\d.:\/a-z]+)\/\d+ is a broad match to match an ipv4 or ipv6 pattern. The links contain pages with a more specific pattern.

Example code

import regex

pattern=r"(?:host-vpp1out.*|\G(?!^))\n\s+L3\s+(\d[\d.:\/a-z]*)\/\d+"
test_str = ("VirtualFuncEthernet0/7/0.2001 (up):\n"
    "  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2\n"
    "  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1\n"
    "VirtualFuncEthernet0/9/0 (dn):\n"
    "host-vpp1out (up):\n"
    "  L3 1.1.1.1/24\n"
    "  L3 1.1.2.1/24\n"
    "  L3 2001:db8:0:1:1:1:1:1/112\n"
    "  L3 2001:db8:0:1:1:1:2:1/112\n"
    "local0 (dn):\n"
    "loop0 (up):\n"
    "  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1")

print(regex.findall(pattern, test_str))

Output

['1.1.1.1', '1.1.2.1', '2001:db8:0:1:1:1:1:1', '2001:db8:0:1:1:1:2:1']

Using re instead of regex, you could also do it in 2 steps, first matching host-vpp1out and the L3 lines. Then from that match, you can get the values in group 1 using re.findall.

import re
 
regex=r"^host-vpp1out .*(?:\r?\n[^\S\r\n]*L3 .*)*"
test_str = ("VirtualFuncEthernet0/7/0.2001 (up):\n"
            "  L3 1.1.2.2/24 ip4 table-id 8 fib-idx 2\n"
            "  L3 2001:db8:0:1:1:1:2:2/112 ip6 table-id 8 fib-idx 1\n"
            "VirtualFuncEthernet0/9/0 (dn):\n"
            "host-vpp1out (up):\n"
            "  L3 1.1.1.1/24\n"
            "  L3 1.1.2.1/24\n"
            "  L3 2001:db8:0:1:1:1:1:1/112\n"
            "  L3 2001:db8:0:1:1:1:2:1/112\n"
            "local0 (dn):\n"
            "loop0 (up):\n"
            "  L3 1.1.1.1/32 ip4 table-id 7 fib-idx 1")
 
match = re.search(regex, test_str, re.MULTILINE)
 
if match:
    print(re.findall(r" L3 (\d[\d.:\/a-z]+)\/\d+", match.group()))

Python demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70