Given the solution in How do i extract a list of elements encased in quotation marks bounded by <> and delimited by commas - python, regex?, I was able to capture the prefix
and the values
of the desired pattern denoted by a CAPITALIZED.PREFIX
and values within angle brackets < "value1" , "value2", ... >
"""calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up"> ,\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30"\n JOKE <'whatthe ', "what" >,\n THIS + ]."""
However I get into problems with i have strings like the one above. The desired output would be:
('ORTH.FOO', ['cali.ber,kl','calf','done'])
('ORHT2BAR', ['what so ever >', 'this that mess < up'])
('JOKE', ['whathe ', 'what'])
I have tried the following but it only give me the 1st tuple, how do i get all possible tuples as in the desired output?:
import re
intext = """calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up">\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30" ]."""
pattern = re.compile(r'.*?([A-Z0-9\.]*) < ([^>]*) >.*', flags=re.DOTALL)
f, v = pattern.match(intext).groups()
names = re.findall('[\'"](.*?)["\']', v)
print f, names