Attempting to web scrape. Downloaded html code is slightly different from code on live site

Question

I'm new to web scraping and I'm trying to build a very basic stock tracker for the site pokemoncenter.com. When visiting the product pages of items on the live site, the add to cart button displays as:

<button type="button" class="jsx-2748458255 product-add btn btn-secondary">Add to Cart</button>

When the item is out of stock the button is:

<button type="button" disabled="" class="jsx-2748458255 product-add btn btn-tertiary disabled">Out of Stock</button>

But whenever I try to scrape the site, regardless of whether the item is in stock or not, the button is:

<button class="jsx-2748458255 product-add btn btn-tertiary disabled" disabled="" type="button"></button>

So essentially it always displays as out of stock when I download the html code with requests.get().

import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen, Request 
import requests
 
page_url = "https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in"

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}

req = requests.get(page_url, headers = headers)

page_soup = soup(req.text, "html.parser")

#Find add to cart button
divs = page_soup.findAll("div", {"class" : "jsx-829839431 product-col"})
button = str(divs[1].find("button", {"class" : "jsx-2748458255"}))


#Check if button is disabled or not
if (button.find('disabled') != -1): 
    print("Out of Stock")
else:
    print("In Stock")

In stock example: https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in
Out of stock example: https://www.pokemoncenter.com/product/701-06558/gigantamax-pikachu-poke-plush-17-in

Some sites change their html code with javascript on load that requests won't access. You can check it with developer tools in chrome and disable javascript to see the result. — goalie1998, Jan 16 '21 at 04:48
@goalie1998 is right. Check this out: https://stackoverflow.com/a/50612469/6010889 — WebSpence, Jan 16 '21 at 04:58

score 0 · Accepted Answer · answered Jan 16 '21 at 05:51

0

As goalie1998 mentioned, the site could be using javascript to only load necessary images first to reduce initial load time. You could probably still use Selenium to scrape that website since it can imitate browser behavior.

answered Jan 16 '21 at 05:51

Noob Life

540
3
10

Attempting to web scrape. Downloaded html code is slightly different from code on live site

1 Answers1