I want to parse a web page and find specific patterns using regex on Python.
My Example page have:
<input type="checkbox" name="some name....">
<input type="text", name="somemore name...">
<input type="radio" name="other name...">
And i want to find all matcihng name values of radio and checkbox inputs.
<input type="checkbox" name="(.*?)".*?>
<input type="radio" name="(.*?)".*?>
But i can not figure out how to combine these to regex to a single one?
EDIT: That question might switch to other directions. But it is better for me to tell what i want to do and is my choice of regex usage really suitable for that...
I must query a subscriber and get some basic info about the subscriber and a list of available loans and charges of the sbscriber. RElated module has many scripts that do that kind of job with regex. I also use SGMLparser for some part in my code. But i sometimes see SGML parser fails to parse HTML (did not dig it why it fails but basic reason is unexpected char type errors). So, i must be sure that i van either handle all type of HTML code, or keep on doing this by regex.
CONCLUSION: It is the best choice to use HTMLParser
, and using regex
is simple a verry bad idea... That is what i get from this question... But since the Question itself is more about regex matcihng then regex usage in thml, i decided to accept the answer abour regex...