0

I am trying to scrape a page with BeautifulSoup and there are <script> tags inside <span> tag as shown below

<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>

But as <script> tags are not parsed as HTML in bs4, following code returns <span> tag without the text ("более 20 раз")

rating = soup.find("p", {"class": "order-quantity"})

How can I get the text within the <span> tag?

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Is the page loaded dynamically? does the text appear when using `print(soup.prettify())`? – MendelG Mar 07 '21 at 23:12
  • BS4 will parse script tags, it just doesn't execute them. But if the text is in the HTML it should be returned. – Barmar Mar 07 '21 at 23:14
  • @MendelG no it does not include any of – Sofiya Chobanyan Mar 07 '21 at 23:36
  • @SofiyaChobanyan The page is loaded dynamically. See [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – MendelG Mar 07 '21 at 23:41
  • Thank you, will check it out – Sofiya Chobanyan Mar 07 '21 at 23:43

1 Answers1

0

The text is under the tag <script type="jsv#26^">. You can search for it using soup.find("script", type="jsv#26^").

from bs4 import BeautifulSoup


html = """
<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>
"""

soup = BeautifulSoup(html, "html.parser")

print(soup.find("script", type="jsv#26^").find_next(text=True).strip())

Output:

более 20 раз
MendelG
  • 14,885
  • 4
  • 25
  • 52