Scraping tag with BeautifulSoup

Question

I am trying to scrape a page with BeautifulSoup and there are <script> tags inside <span> tag as shown below

<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>

But as <script> tags are not parsed as HTML in bs4, following code returns <span> tag without the text ("более 20 раз")

rating = soup.find("p", {"class": "order-quantity"})

How can I get the text within the <span> tag?

Is the page loaded dynamically? does the text appear when using `print(soup.prettify())`? — MendelG, Mar 07 '21 at 23:12
BS4 will parse script tags, it just doesn't execute them. But if the text is in the HTML it should be returned. — Barmar, Mar 07 '21 at 23:14
@SofiyaChobanyan The page is loaded dynamically. See [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) — MendelG, Mar 07 '21 at 23:41

score 0 · Answer 1 · answered Mar 07 '21 at 23:19

The text is under the tag <script type="jsv#26^">. You can search for it using soup.find("script", type="jsv#26^").

from bs4 import BeautifulSoup


html = """
<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>
"""

soup = BeautifulSoup(html, "html.parser")

print(soup.find("script", type="jsv#26^").find_next(text=True).strip())

Output:

более 20 раз

The code throws the following error: AttributeError: 'NoneType' object has no attribute 'find_next' — Sofiya Chobanyan, Mar 07 '21 at 23:38

Scraping tag with BeautifulSoup

1 Answers1