Scraping using BeautifulSoup print an empty output

Question

I'm trying to scrape a website. I want to print all the elements with the following class name,

class=product-size-info__main-label

The code is the following:

from bs4 import BeautifulSoup  with open("MadeInItaly.html", "r") as f: 
   doc= BeautifulSoup (f, "html.parser")
   tags = doc.find_all(class_="product-size-info__main-label") 

  print(tags)

Result: [XS, XS, S, M, L, XL]

All good here.

Now this is when done on the file MadeInItaly.html (it works) which is basically the same website I am trying to use, but the version saved on my disk.

Now, with the version from the URL.

from bs4 import BeautifulSoup 
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"} 

url = "https://www.zara.com/es/es/vestido-midi-volantes-cinturon-con-lino-p00387075.html?v1=258941747&v2=2184287"
result = requests.get(url,headers=headers)
doc = BeautifulSoup(result.text, "html.parser")

tags = doc.find_all(class_="product-size-info__main-label")
print(tags)

Result: []

I have tried with different User Agent Headers, what could be wrong here?

Thanks for your help Rob. You deserve the grammar corrector badge too. — Lorenzo Castagno, Jun 03 '23 at 10:06
You know, obviously, much more than me about Badges. Just remember we are all here to learn. Wish you a great day. — Lorenzo Castagno, Jun 03 '23 at 10:20

score 1 · Answer 1 · answered Jun 02 '23 at 21:01

The local HTML file you have might be a fully loaded version of the webpage, with all JavaScript executed and all the dynamic content loaded. The website you're trying to scrape live might be using JavaScript to load the elements with the class product-size-info__main-label, and the requests module does not execute JavaScript. This could explain why you're seeing those elements in the local HTML file but not in the live scraped version. In such a case, I think you may need to use a library that can handle JavaScript, like Selenium.

score 1 · Accepted Answer · answered Jun 03 '23 at 11:07

1

As already answered, the problem is that the elements are loaded with js with the class you are looking for. Here is a post that solves the problem for you with Selenium. It works with that:

https://stackoverflow.com/a/11238391/21607327

answered Jun 03 '23 at 11:07

Georgis

26
3

Solved. Thanks for adding the hint for it. – Lorenzo Castagno Jun 03 '23 at 12:52

Scraping using BeautifulSoup print an empty output

2 Answers2