I have a html format file with all sorts of data, I need to extract from it certain pairs of (id, title). To do this I wrote an regEx that seems to work fine in regEx online tester.
File from where I need to extract data:
<g id="node841" class="cond_node"><title>SR_AUD_Nbest_List_PlaylistPlayPlaylist_cond</title>
<g id="node842" class="prompt_node"><title>SR_AUD_Nbest_List_PlaylistPlayPlaylist_prompt</title>
<g id="edge841" class="edge"><title>SR_AUD_Nbest_List_PlaylistPlayPlaylist_cond->SR_AUD_Nbest_List_PlaylistPlayPlaylist_prompt</title>
<g id="node848" class="node"><title>SR_AUD_Main_link_51</title>
<g id="node841" class="prompt_node"><title>SR_AUD_Nbest_List_PlaylistPlayPlaylist_prompt</title>
<g id="node841" class="cmd_node"><title>SR_AUD_Nbest_List_PlaylistPlayPlaylist_cmd</title>
<g id="node856" class="exit_node"><title>EXIT_63</title>
<g id="node860" class="node"><title>SR_AUD_ConfirmNAPlayPlaylistName_NotAvailable_3</title>
<g id="node860" class="node"><title>SR_AUD_ConfirmNAPlayPlaylistName_NotAvailable_4</title><title>SR_AUD_ConfirmNAPlayPlaylistName_NotAvailable_3</title>
With this regEx:
(<g\sid="\w+"\s+class="node">+.{1,})(?!.+(_cmd|_cond|_prompt|EXIT))
I am extracting entire lines with the above conditions.
The python script that uses the file and the regEx to extract those specific lines:
result = re.search(r'(id="\w+"\s+class="node">+.{1,})(?!.+(_cmd|_cond|_prompt|EXIT))', svg)
But the problem is that result only contains 1 pair of data (only for node id 848) separated by "space char" not the entire list of lines that will be extracted with the regEx.
Do you have any idea how to extract all data that matches that regEx from the entire file, not only 1 line? In this particular case the extracted data should be, as the online regex tester says:
<g id="node848" class="node"><title>SR_AUD_Main_link_51</title>
<g id="node860" class="node"><title>SR_AUD_ConfirmNAPlayPlaylistName_NotAvailable_3</title>
<g id="node860" class="node"><title>SR_AUD_ConfirmNAPlayPlaylistName_NotAvailable_4</title>