1

I have some unstructured text I need to match each td city and whatever the text it has to the next the city, but not to include the last td city, then from the last one to the next and so on: for example: (i need to get all the text starting from <tr><td class="city" till before the next <tr><td class="city")

<tr><td class="city" colspan="6"><p><a href="#home">Top</a><br /><br /><a name="Bloomington"><h2>Bloomington</h2></a></p></td></tr><tr><td class="blank">&nbsp;</td><td class="day" colspan="5">Monday</td>rwerjlkrw</tr>

<tr><td class="city" colspan="6"><p><a href="#home">Top</a><br /><br /><a name="Abb"><h2>abb</h2></a></p></td></tr><tr><td class="blank">&nbsp;</td><td class="day" colspan="5">Monday</td><class type></tr>

<tr><td class="city" colspan="6"><p><a href="#home">Top</a><br /><br /><a name="acc"><h2>acc</h2></a></p></td></tr><tr><td class="blank">&nbsp;</td><td class="day" colspan="5">Monday</td><tr>fdf</tr></tr>

the text look like this

<tr><td class="city" colspan="6"><p><a href="#home">Top</a><br /><br /><a name="Bloomington"><h2>Bloomington</h2></a></p></td></tr><tr><td class="blank">&nbsp;</td><td class="day" colspan="5">Monday</td></tr><tr><td class="city" colspan="6"><p><a href="#home">Top</a><br /><br /><a name="Abb"><h2>abb</h2></a></p></td></tr><tr><td class="blank">&nbsp;</td><td class="day" colspan="5">Monday</td></tr><tr><td class="city" colspan="6"><p><a href="#home">Top</a><br /><br /><a name="acc"><h2>acc</h2></a></p></td></tr><tr><td class="blank">&nbsp;</td><td class="day" colspan="5">Monday</td></tr>
Hat hout
  • 471
  • 1
  • 9
  • 18
  • 1
    [Relevant](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Mac Aug 18 '17 at 00:31
  • @RajanChauhan they are not starting with tr in the begiing as you assumed the tags are not seperated with lines, i edited the post to mention the original text – Hat hout Aug 18 '17 at 00:42
  • `re.findall('(?:.*)?',html_string)` – Rajan Chauhan Aug 18 '17 at 00:42
  • 1
    @RajanChauhan check my previous comment, please – Hat hout Aug 18 '17 at 00:43
  • 2
    I will suggest you to use BeautifulSoup if you specifically working with html text. It is more suitable for parsing and working around with html text – Rajan Chauhan Aug 18 '17 at 00:47
  • 1
    You should not parse HTML with regular expressions. Use BeautifulSoup, as anther commenter suggested. – DYZ Aug 18 '17 at 01:21

0 Answers0