0

I'm trying to match a pattern against string that could have multiple instances of similar groups and each group could have multiple instances of similar elements.

example:

html_string = '''
..............................
<a>Receive By</a><span>I_AM_INTERESTED_IN_THIS</span>
..............................
<input name="SOMENAME" value="I_AM_INTERESTED_IN_THIS_TOO">
..............................
<input name="SOMENAME" value="I_AM_INTERESTED_IN_THIS_TOO">
..............................
<a>Receive By</a><span>I_AM_INTERESTED_IN_THIS</span>
..............................
<input name="SOMENAME" value="I_AM_INTERESTED_IN_THIS_TOO">
..............................
<a>Receive By</a><span>I_AM_INTERESTED_IN_THIS</span>
..............................
<input name="SOMENAME" value="I_AM_INTERESTED_IN_THIS_TOO">
..............................
<input name="SOMENAME" value="I_AM_INTERESTED_IN_THIS_TOO">
..............................
<input name="SOMENAME" value="I_AM_INTERESTED_IN_THIS_TOO">
..............................
'''

match = re.findall(r'(?:>Receive By</a><span>(.*?)<.*?)?name=\"SOMENAME\" value=\"(.*?)\"', html_string)

but I am not getting desired results with 1 regex. In other words results must be grouped by "Receive By" value containing input values of corresponding group

wol
  • 142
  • 1
  • 14
  • 3
    use a html parser - not regex, – Patrick Artner Jun 14 '20 at 10:04
  • 6
    [TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/a/1732454/7505395) – Patrick Artner Jun 14 '20 at 10:06
  • 2
    As mentioned by @PatrickArtner, using regex is not a recommended solution for parsing HTML, instead use html parser like [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) – sushanth Jun 14 '20 at 10:06
  • Thanks, I will definitely go with that way, but anyway I am interested how could it be achieved with regex. Lets forget that its a HTML. – wol Jun 14 '20 at 10:10
  • 1
    Well, no one will forget this is HTML if your question contains HTML snippets. Even if we imagine it is text, how come you expect to find a match in `Receive ByI_AM_INTERESTED_IN_THIS` if there is no `name=` and `value=` that are required in your pattern? – Wiktor Stribiżew Jun 14 '20 at 11:22

0 Answers0