I want to match html headers <h1> - <h6>
in html with python regular expression. Some of the headers contain 'id' attribute, and I want to put it into a group.
By trying the following expression I get the one with id attribute.
>>>re.findall(r'<h[1-6].*?(id=\".*?\").*?</h[1-6].*?>','<h1>Header1</h1><h2 id="header2">header2</h2>')
['id="header2"']
The question mark cause the RE to match 0 or 1 repetitions of preceding RE. If i put a ? after the right parenthesis, it will return two empty strings.
>>>re.findall(r'<h[1-6].*?(id=\".*?\")?.*?</h[1-6].*?>','<h1>Header1</h1><h2 id="header2">header2</h2>')
['', '']
How to use one regular expression to get the following result?
['', 'id="header2"']