How can I str.replace()
or '=' in Python?

Question

I'm having a hard time trying to get rid of all the extra HTML tags within the text I scraped from a certain web page, however, str.replace() in Python doesn't seem to be working for targets like <br> and =, while other tags such as <li></li> will be successfully replaced.

Here's my code.

str(txt).replace('<li>', '')
        .replace('</li>', '')
        .replace('<ol>', '')
        .replace('</ol>', '')
        .replace('<br>', '')
        .replace('=', '')

Any advice will be much appreciated.

Possible duplicate of [Strip HTML from strings in Python](http://stackoverflow.com/questions/753052/strip-html-from-strings-in-python) — Robert Valencia, Apr 14 '17 at 01:28

score 1 · Answer 1 · answered Apr 14 '17 at 01:35

1

You can use BeautifulSoup to get the text from the page:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_source)
text = soup.get_text()

BeautifulSoup parses the html, and has an easy built-in function for getting the text.

answered Apr 14 '17 at 01:35

zbw

922
5
13

Thank you for the quick response. I'll give it a try! – Yuta Apr 14 '17 at 02:01

How can I str.replace() or '=' in Python?

1 Answers1

How can I str.replace()
or '=' in Python?