1

I am new scraping with Python and BeautifulSoup4. Also, I do not have knowledge of HTML. To practice, I am trying to use it on Carrefour website to extract the price and price per kilogram of the product that I search for EAN code. My code:

barcodes = ['5449000000996']

for barcode in barcodes:
    url = 'https://www.carrefour.es/?q=' + barcode
    html = requests.get(url).content
    bs = BeautifulSoup(html, 'lxml')

    searchingprice = bs.find_all('strong', {'class':'ebx-result-price__value'})
    print(searchingprice)

    searchingpricerperkg = bs.find_all('span', {'class':'ebx-result__quantity ebx-result-quantity'})
    print(searchingpricerperkg)

But I do not get any result at all

Here is a screenshot of the HTML code:

Website screenshot

What am I doing wrong? I tried with other website and it seems to work

Pin_Eipol
  • 67
  • 5

1 Answers1

1

The problem here is that you're scraping a page with Javascript-generated content. Basically, the page that you're grabbing with requests actually doesn't have the thing you're grabbing from it - it has a bunch of javascript. When your browser goes to the page, it runs the javascript, which generates the content - so the page you see in the rendered version in your browser is not the same thing returned from the actual page itself. The page contains instructions for your browser to write the page that you see.

If you're just practicing, you might want to simply try a different source to scrape from, but to scrape from this page, you'll need to look into other solutions that can handle javascript generated content:

Web-scraping JavaScript page with Python

Alternatively, the javascript generates content by requesting data from other sources. I don't speak spanish, so I'm not much help in figuring this part out, but you might be able to.

As an exercise, go ahead and have BS4 prettify and print out the page that it receives. You'll see that within that page there are requests to other locations to get the info you're asking for. You might be able to change your request to not go to the page where you view the info, but to the location that page gets it's data from.

Kyle Alm
  • 587
  • 3
  • 14