How to remove everything within <>

Question

I have a string that looks kinda like: Approximately Silver I MMR resembles the top 49% of summoners in Silver I

Problem is that I dont want anything within < and >. In an old code my solution was:

summary = MMR_info["ranked"]["summary"]
summary = summary.replace('<b>', '')
summary = summary.replace('<br>', '')
summary = summary.replace('<span class="symbol--micro"></span>', ' ')
summary = summary.replace('</b>', '')

but this wasn't very pretty. I would appreciate the how and why on doing this the most efficient way.

This is also the topic of one of the most notorious posts on the site: https://stackoverflow.com/q/1732348/1405065 — Blckknght, Mar 07 '21 at 23:17
Reason i asked was that my peanut brain could not make this way work — ApenJulius, Mar 07 '21 at 23:33

score 1 · Accepted Answer · answered Mar 07 '21 at 23:17

Use a HTML parser such as BeautifulSoup:

from bs4 import BeautifulSoup

html = 'Approximately <b>Silver I</b><br><br><span class=symbol--micro></span>MMR resembles the <b>top 49%</b> of summoners in Silver I'
soup = BeautifulSoup(html)

print(soup.text)

Output:

Approximately Silver IMMR resembles the top 49% of summoners in Silver I

Note that regular expressions are often suggested as a way to deal with HTML modification, but they usually become difficult to understand and maintain .

Sorry for making yet another such question, thank you for giving an answer simple enough for my peanut brain! — ApenJulius, Mar 07 '21 at 23:29

How to remove everything within <>

1 Answers1