I need to extract some data from a string like this (VHDL code):
entBody = """entity pci_bfm is
generic(
G_INST_NAME : string := "PCI_BFM";
G_HANDLE_NO : rpciBfmHandleNo := 0;
G_IDSEL_POS_EXT_TARGET : idsel_pos := 30;
G_IDSEL_POS_INT_TARGET : idsel_pos := 29
);
port(
i_tb_stop : in boolean; -- Testbench global sto
o_clk : out std_logic; -- PCI clock.
o_rstn : out std_logic; -- PCI reset.
o_idsel : out std_logic; -- Initialization devic
i_reqn : in std_logic; -- Request. The reqn in
o_gntn : out std_logic; -- Grant. The gntn onpu
io_ad : inout std_logic_vector(31 downto 0); -- Address/data bus. Th
io_cben : inout std_logic_vector(3 downto 0); -- Command/byte enable.
io_par : inout std_logic; -- Parity. The par sign
io_framen : inout std_logic; -- Frame. The framen si
io_irdyn : inout std_logic; -- Initiator ready. The
io_devseln : inout std_logic; -- Device select. Targe
io_trdyn : inout std_logic; -- Target ready. The tr
io_stopn : inout std_logic; -- Stop. The stopn sign
io_perrn : inout std_logic; -- Parity error. The pe
i_serrn : in std_logic; -- System error. The se
i_intan : in std_logic; -- Interrupt A. The int
o_lockn : out std_logic -- Locked operations. R
);
end entity pci_bfm;"""
The VHDL comments do not have all the same size, I truncated them to be easier to read.
I am interested to get everything between 'port(' and last ');' (the one that closes port declarations). Of course the VHDL declarations may not be well indented and formatted as here.
I have a Python 2.7.x regex for this:
pattern = re.compile("port\s*\((.*?)\s+\)\s*;")
match3 = pattern.search(entBody)
ports = match3.group(1)
It works well if the closing ); is not immediately after the last declaration. The following will not work:
entBody2 = """entity QSPI_FLASH_SPANSION_S25FL_BFM is
generic
(
G_INST_NAME : string := "QSPI_FLASH_SPANSION_S25FL_BFM";
G_HANDLE_NO : integer := 2
);
port (
tb_stop : in boolean; -- Testbench global stop.
sclk : in std_logic;
csn : in std_logic;
sdat : inout std_logic_vector(3 downto 0));
end;"""
If I change my regex a little bit like this:
pattern = re.compile("port\s*\((.*?)\s*\)\s*;") # \s* instead of \s+
then the search will end at 'io_ad : inout std_logic_vector(31 downto 0' which is not good at all.
I was wondering if I can use regex to to a search like this, i.e. to count opening parenthesis and only stop when all parenthesis are closed.
If there is no simple way, I will do a simple string search using string functions to solve it.
Thank you.