0

Example

html-code
html-code
<div data-content="N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l"></div>
html-code
html-code

How to find this DIV and get the data contained between quotes using BeautifulSoup? data-content="?????"

2 Answers2

2

Very easy using css selectors like this:

from bs4 import BeautifulSoup

html = '<div data-content="N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l"></div>'

soup = BeautifulSoup(html, 'lxml')

soup.select_one('div[data-content]')["data-content"]

OUTPUT

'N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l'
Anatol
  • 3,720
  • 2
  • 20
  • 40
  • Welcome to `stackoverflow` community, actually there's no difference at all. [check](https://stackoverflow.com/questions/25714417/beautiful-soup-and-table-scraping-lxml-vs-html-parser) – αԋɱҽԃ αмєяιcαη Apr 04 '20 at 10:20
  • @αԋɱҽԃαмєяιcαη Thanks for the correction. I removed that from my answer :) – Anatol Apr 04 '20 at 10:32
1

Easy using soup.findAll("div", attrs={"data-content":True})

Like the following:

from bs4 import BeautifulSoup

html = """
<div data-content="N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l" href="www.test1.com" </div>
<div data-content="2" href="www.test1.com" </div>
<div data-content="3" href="www.test2.com" </div>
<div data-content="4" href="www.test2.com" </div>
<div data-content="5" href="www.test3.com" </div>
<div data-content="6" href="www.test3.com" </div>
"""


soup = BeautifulSoup(html, 'html.parser')


goal = [url.get("data-content")
        for url in soup.findAll("div", {'data-content': True})]

print(goal)

Output:

['N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l', '2', '3', '4', '5', '6']