2

There is a part of the HTML code:

<div class="some div name" data-text="important text">...</div> 

We need to get the text from "data-text". I was trying to find something in the BeautifulSoup official documentation, but there is nothing like this (or I was looking badly).

Sergey Dyakov
  • 35
  • 1
  • 4
  • 1. Pass your html object into beautiful soup. Ldata = BeautifulSoup("
    ...
    "); 2. Get the attributes you are looking for from the div. ldataText = Ldata.attrs['data-text'];
    – Suresh Jul 16 '20 at 19:30

4 Answers4

3

You can use ['data-text'] or .get('data-text') on tag to get attribute value.

For example:

from bs4 import BeautifulSoup

txt = '''<div class="some div name" data-text="important text">...</div>'''
soup = BeautifulSoup(txt, 'html.parser')

print(soup.find('div', {'data-text': True})['data-text'])

Prints:

important text
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

you have just to replace 'href' with 'data-text' in this code:

html = urlopen("http://kite.com")
text = html.read()
plaintext = text.decode('utf8')
links = re.findall("href=[\"\'](.*?)[\"\']", plaintext)
print(links[:5])

https://kite.com/python/answers/how-to-get-href-links-from-urllib-urlopen-in-python

0

You can try this one.

from bs4 import BeautifulSoup
import requests
...
bsObj = BeautifulSoup(html, features = "html.parser")
div_tag = bsObj.find("div", class_ = "some div name")
if div_tag:
    data_text = div_tag['data-text']
print(data_text)

Hope it could help you.

Metalgear
  • 3,391
  • 1
  • 7
  • 16
0

With so little information you provide, I can not come up with anything better than this:

from bs4 import BeautifulSoup

html = '<div class="some div name" data-text="important text">...</div>'
soup = BeautifulSoup(html, 'html.parser')
div = soup.select_one('div.some.div.name')
print(div.get('data-text'))

Output:

important text
MrNobody33
  • 6,413
  • 7
  • 19
UWTD TV
  • 910
  • 1
  • 5
  • 11