How to parse "data-text" with beautifulsoup?

Question

There is a part of the HTML code:

<div class="some div name" data-text="important text">...</div>

We need to get the text from "data-text". I was trying to find something in the BeautifulSoup official documentation, but there is nothing like this (or I was looking badly).

1. Pass your html object into beautiful soup. Ldata = BeautifulSoup("
...
"); 2. Get the attributes you are looking for from the div. ldataText = Ldata.attrs['data-text']; — Suresh, Jul 16 '20 at 19:30

score 3 · Answer 1 · answered Jul 16 '20 at 18:15

3

You can use ['data-text'] or .get('data-text') on tag to get attribute value.

For example:

from bs4 import BeautifulSoup

txt = '''<div class="some div name" data-text="important text">...</div>'''
soup = BeautifulSoup(txt, 'html.parser')

print(soup.find('div', {'data-text': True})['data-text'])

Prints:

important text

answered Jul 16 '20 at 18:15

Andrej Kesely

168,389
15
48
91

1

Thank you! This works exactly the way I want it! – Sergey Dyakov Jul 16 '20 at 18:34

score 1 · Accepted Answer · answered Jul 16 '20 at 18:18

you have just to replace 'href' with 'data-text' in this code:

html = urlopen("http://kite.com")
text = html.read()
plaintext = text.decode('utf8')
links = re.findall("href=[\"\'](.*?)[\"\']", plaintext)
print(links[:5])

https://kite.com/python/answers/how-to-get-href-links-from-urllib-urlopen-in-python

score 0 · Answer 3 · answered Jul 16 '20 at 18:20

You can try this one.

from bs4 import BeautifulSoup
import requests
...
bsObj = BeautifulSoup(html, features = "html.parser")
div_tag = bsObj.find("div", class_ = "some div name")
if div_tag:
    data_text = div_tag['data-text']
print(data_text)

Hope it could help you.

score 0 · Answer 4 · edited Jul 16 '20 at 18:29

0

With so little information you provide, I can not come up with anything better than this:

from bs4 import BeautifulSoup

html = '<div class="some div name" data-text="important text">...</div>'
soup = BeautifulSoup(html, 'html.parser')
div = soup.select_one('div.some.div.name')
print(div.get('data-text'))

Output:

important text

edited Jul 16 '20 at 18:29

MrNobody33

6,413
7
19

answered Jul 16 '20 at 18:22

UWTD TV

910
1
5
11

How to parse "data-text" with beautifulsoup?

4 Answers4