Someone at my company created a regex string that is working as intended but I want to do similar things in the future and don't understand the mechanics of it.
import re
config = """
interface Ethernet1/2
description Management
ip address 192.168.1.190 255.255.255.0
ip pim sparse-dense-mode
!
interface Ethernet1/3
description to-IntDMVPN
ip address 10.0.0.93 255.255.255.252
ip pim sparse-dense-mode
!
router eigrp 1
network 172.16.0.0 0.0.0.255
redistribute ospf 1 metric 1 1 1 1 1
!
"""
intf_obj_list = re.compile(r'^interface\s*(\S+)(.+?)(?:^\S+|\Z)', re.S|re.M).findall(config)
print(str(intf_obj_list))
I understand up to ^interface\s*(\S+)
which matches the router interfaces but adding the (.+?)
also matches the /n
on the first line of each interface config even though the dot is supposed to match any character but the newline. Now (?:^\S+|\Z)
adds matching of the body of the configuration under the interface and I don't understand that section at all. From what I could find ?:
is another way to configure backreferencing which after a fair amount of googling, I'm still not clear on and I'm not sure what the ^\S+|\Z
within it are accomplishing. Finally, the re.S
I think allows it to continue to run through the body while encountering newlines and re.M
allows it to run through multiple lines. If anyone can help me to break down what's happening here I would greatly appreciate it. Thanks