Get contents of div by id with BeautifulSoup

Question

I am using python2.7.6, urllib2, and BeautifulSoup

to extract html from a website and store in a variable.

How can I show just the html contents of a div with an id by using beautifulsoup?

<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

would be

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

alecxe · Answer 1 · 2014-09-02T01:59:23.397

18

Join the elements of div tag's .contents:

from bs4 import BeautifulSoup

data = """
<div id='theDiv'>
    <p>div content</p>
    <p>div stuff</p>
    <p>div thing</p>
</div>
"""

soup = BeautifulSoup(data)
div = soup.find('div', id='theDiv')
print ''.join(map(str, div.contents))

Prints:

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

edited Sep 02 '14 at 01:59

answered Sep 02 '14 at 01:49

alecxe

462,703
120
1,088
1,195

That appears to work! can you explain what is going on with `print ''.join(map(str, div.contents))` – user8028 Sep 02 '14 at 03:37
@user8028 sure, `contents` actually contains all of the tag's children that can be represented as a string, or as a `Tag` class instance. Applying `map(str, ...)` helps to cast every child to string. Hope that helps. – alecxe Sep 02 '14 at 03:38
i have a special character (€) in the content of the div. how can I encode this to ascii so it is printable to terminal or writable to a file? I always receive error `UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 31: ordinal not in range(128)` – Burcardo May 03 '18 at 11:57

Antony Hatchkins · Answer 2 · 2020-07-09T18:36:39.167

1

Since version 4.0.1 there's a function decode_contents():

>>> soup = BeautifulSoup("""
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
""")

>>> print(soup.div.decode_contents())

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

More details in a solution to this question: https://stackoverflow.com/a/18602241/237105

edited Jul 09 '20 at 18:36

answered Jul 09 '20 at 17:57

Antony Hatchkins

31,947
10
111
111

Get contents of div by id with BeautifulSoup

2 Answers2