how to print only text beautifulsoup

Question

I am trying to learn how beautifulsoup works in order to create an application.

I am able to find and print all elements with .find_all() however they print the html tags as well. How can I print ONLY the text within these tags.

This is what I have:

from bs4 import BeautifulSoup

"""<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
print i

Possible duplicate of [Using BeautifulSoup Extract Text without Tags](http://stackoverflow.com/questions/23380171/using-beautifulsoup-extract-text-without-tags) — franklinsijo, Feb 02 '17 at 17:41
@franklinsijo Yeah. I also linked another of the same question in my answer. — Steampunkery, Feb 02 '17 at 17:46

score 4 · Answer 1 · answered Feb 02 '17 at 17:40

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
for p in i:
    print p.text

find_all() will return a list of tag, you should iterate over it and use tag.text to get the text under the tag

Better way:

for p in soup.find_all('p'):
    print p.text

score 4 · Accepted Answer · answered Feb 03 '17 at 05:58

4

This may help you:-

from bs4 import BeautifulSoup
source_code = """<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""
soup = BeautifulSoup(source_code)
print soup.text

Output:-

1
2
3

answered Feb 03 '17 at 05:58

Piyush S. Wanare

4,703
6
37
54

score 0 · Answer 3 · answered Feb 02 '17 at 17:44

0

I think you can do what they do in this stackoverflow question. Use findAll(text=True). So in your code:

from bs4 import BeautifulSoup

"""<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.findAll(text=True)
print i

answered Feb 02 '17 at 17:44

Steampunkery

3,839
2
19
28

1

this will return all the text in the HTML code including comment, this is definitely not a solution – 宏杰李 Feb 02 '17 at 17:47
including comment? Do you mean including comments? – Steampunkery Feb 02 '17 at 17:48
The `Comment` object is just a special type of `NavigableString` – 宏杰李 Feb 02 '17 at 17:50

how to print only text beautifulsoup

3 Answers3

Linked

Related