I have a few html files with two different patterns of a piece of code, where only name="horizon"
is constant. I need to get the value of an attribute named as "value". Below are the sample files:-
File1:
<tag1> data
</tag1>
<select size="1" name="horizon">
<option value="Admin">Admin Users</option>
<option value="Remote Admin">Remote Admin</option>
</select>
File2:
<othertag some_att="asfa"> data
</othertag>
<select id="realm_17" size="1" name="horizon">
<option id="option_LoginPage_1" value="Admin Users">Admin Users</option>
<option id="option_LoginPage_1" value="Global-User">Global-User</option>
</select>
Since the files will have other tags and attributes, I tried writing regular expressions by referring this to filter the required content from the files with these regular expressions.
regex='^(?:.*?)(<(?P<TAG>\w+).+name\=\"horizon\"(?:.*[\n|\r\n?]*)+?<\/(?P=TAG>)'
I have tried this with re.MULTILINE
and re.DOTALL
but could not get desired text.
I suppose, I would be able to find the required names as list by using re.findall('value\=\"(.*)\",text)
once I get the required text.
Please suggest if there is any elegant way to handle the situation.