I have a HTML file(I can't use HTML AgilityPack) that I want to extract the id of a div(if it has one)
<div id="div1">Street ___________________ </div>
<div id="div2">CAP |__|__|__|__|__| number ______ </div>
<div id="div3">City _____________________ State |__|__|</div>
<div id="div4">City2 ____________________ State2 _____</div>
I have a pattern for extracting underscores __ : [\ _]{3,}
Now if I have a div in front of my underscores I want to extract it, if not I'll get only the underscores.
I have build so far this pattern (<div id(.+?)>(\w)([\ _]{3,}/*))([\ _]{3,})
The first part is build out of 3 groups 1 - a div tag, 2 - a label, 3 - underscores
1 - <div id(.+?)>
, 2 - (\w)
, 3 - [\ _]{3,}/*
The div with the id div2 will not take the id because it contains non-alfanumeric chars.
Q: What is wrong with my pattern ?
Desired matchs for the 4 divs:
<div id="div1">Street ___________________
______
<div id="div3">City _____________________
<div id="div4">City2 ____________________
_____