1

I am trying to learn how beautifulsoup works in order to create an application.

I am able to find and print all elements with .find_all() however they print the html tags as well. How can I print ONLY the text within these tags.

This is what I have:

from bs4 import BeautifulSoup

"""<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
print i
snovosel
  • 91
  • 1
  • 2
  • 10
  • 3
    Possible duplicate of [Using BeautifulSoup Extract Text without Tags](http://stackoverflow.com/questions/23380171/using-beautifulsoup-extract-text-without-tags) – franklinsijo Feb 02 '17 at 17:41
  • @franklinsijo Yeah. I also linked another of the same question in my answer. – Steampunkery Feb 02 '17 at 17:46

3 Answers3

4
soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
for p in i:
    print p.text

find_all() will return a list of tag, you should iterate over it and use tag.text to get the text under the tag

Better way:

for p in soup.find_all('p'):
    print p.text
宏杰李
  • 11,820
  • 2
  • 28
  • 35
4

This may help you:-

from bs4 import BeautifulSoup
source_code = """<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""
soup = BeautifulSoup(source_code)
print soup.text

Output:-

1
2
3
Piyush S. Wanare
  • 4,703
  • 6
  • 37
  • 54
0

I think you can do what they do in this stackoverflow question. Use findAll(text=True). So in your code:

from bs4 import BeautifulSoup

"""<html>
<p>1</p>
<p>2</p>
<p>3</p>
"""

soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.findAll(text=True)
print i
Steampunkery
  • 3,839
  • 2
  • 19
  • 28