0

I'm trying to webscrape a webpage inventories, but the problem is that they don't show up in the output of the my Python script

Here's the original tag that appears on the navigator, with the text i want to scrape:

<span class="currentInv">251</span>
" in stock"

and this is the tag after parsing it using beautifulsoup as a library and lxml as a parser, I even tries other parsers like html.parser and html5lib:

<span class="currentInv"></span>

Here's my full Python script:

import requests 
from bs4 import BeautifulSoup as bs

url = f'https://www.hancocks.co.uk/buy-wholesale-sweets?warehouse=1983&p=1' 
parser = 'lxml' 
headers = {'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}

response = requests.get(url, headers=headers) 
data = response.text 
soup = bs(data, parser)

print(soup.find('span', class_ = 'currentInv').text)

The output is empty

I tried many times over and over, but nothing seems to work well for me

Any help would be so much appreciated.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268

2 Answers2

3

So if you go to view source of the page you'll see the server side render HTML that gets sent down to the page actually also contains no value in that span tag. (i.e. view-source:https://www.hancocks.co.uk/buy-wholesale-sweets?warehouse=1983&p=1).

The value 251 is likely getting added client-side after the DOM is loaded via JavaScript.

I'd go through this answer Web-scraping JavaScript page with Python for more ways to try and extract that JavaScript value.

Andrew Halpern
  • 514
  • 3
  • 6
0

Most likely the page you see in your browser contains dynamic content. This means that when you inspect the page, you see the final result after some JavaScript code ran and manipulated the DOM that is rendered in the browser. When you load the same page in Python code using Beautiful Soup, you get the raw HTML that comes from the request. The JavaScript code for the dynamic content isn't executed, so you will not see the same results.

One solution is to use Selenium instead of Beautiful Soup. Selenium will load a page in a browser and provides an API to interact with that page.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268