-1

I have a string that looks kinda like: Approximately <b>Silver I</b><br><br><span class=symbol--micro></span>MMR resembles the <b>top 49%</b> of summoners in Silver I

Problem is that I dont want anything within < and >. In an old code my solution was:

summary = MMR_info["ranked"]["summary"]
summary = summary.replace('<b>', '')
summary = summary.replace('<br>', '')
summary = summary.replace('<span class="symbol--micro"></span>', ' ')
summary = summary.replace('</b>', '')

but this wasn't very pretty. I would appreciate the how and why on doing this the most efficient way.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
ApenJulius
  • 13
  • 2

1 Answers1

1

Use a HTML parser such as BeautifulSoup:

from bs4 import BeautifulSoup

html = 'Approximately <b>Silver I</b><br><br><span class=symbol--micro></span>MMR resembles the <b>top 49%</b> of summoners in Silver I'
soup = BeautifulSoup(html)

print(soup.text)

Output:

Approximately Silver IMMR resembles the top 49% of summoners in Silver I

Note that regular expressions are often suggested as a way to deal with HTML modification, but they usually become difficult to understand and maintain .

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • Sorry for making yet another such question, thank you for giving an answer simple enough for my peanut brain! – ApenJulius Mar 07 '21 at 23:29