Why does BeautifulSoup give me the wrong text?

Question

I've been trying to get the availability status of a product on IKEA's website. On IKEA's website, it says in Dutch: 'not available for delivery', 'only available in the shop', 'not in stock' and 'you've got 365 days of warranty'.

But my code gives me: 'not available for delivery', 'only available for order and pickup', 'checking inventory' and 'you've got 365 days of warranty'.

What do I do wrong which causes the text to not be the same?

This is my code:

import requests
from bs4 import BeautifulSoup

# Get the url of the IKEA page and set up the bs4 stuff
url = 'https://www.ikea.com/nl/nl/p/flintan-bureaustoel-vissle-zwart-20336841/'
thepage = requests.get(url)
soup = BeautifulSoup(thepage.text, 'lxml')

# Locate the part where the availability stuff is
availabilitypanel = soup.find('div', {'class' : 'range-revamp-product-availability'})

# Get the text of the things inside of that panel
availabilitysectiontext = [part.getText() for part in availabilitypanel]
print(availabilitysectiontext)

If I do inspect element on the page I think the values that you are getting are from "Sources" and not from "Elements" tab which you are looking for. That is what I suspect your problem is. — Rajesh, Jun 11 '20 at 22:42
I think the page loads this information later from: https://www.ikea.com/nl/nl/products/javascripts/range-stockcheck.6bef7c24195468f7305b.js There is another thread on that topic https://stackoverflow.com/questions/43668384/how-can-i-scrape-data-from-websites-dont-return-simple-html — Rajesh, Jun 12 '20 at 16:00

score 1 · Answer 1 · answered Jun 13 '20 at 19:07

With the help of Rajesh, I created this as the script that does exactly what I want. It goes to a certain shop (the one located in Heerlen) and it can check for any out of stock item when it comes back to stock and send you an email whenever it is back in stock.

The script used for this is:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
import smtplib, ssl

# Fill in the url of the product
url = 'https://www.ikea.com/nl/nl/p/vittsjo-stellingkast-zwartbruin-glas-20213312/'

op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op, executable_path='/Users/Jem/Downloads/chromedriver')

# Stuff for sending the email
port = 465
password = 'password'
sender_email = 'email'
receiver_email = 'email'
message = """\
        Subject: Product is back in stock!

        Sent with Python. """

# Keep looping until back in stock
while True:
    driver.get(url)

# Go to the location of the shop 
    btn = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="onetrust-accept-btn-handler"]')))
    btn.click()

    location = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="content"]/div/div/div/div[2]/div[3]/div/div[5]/div[3]/div/span[1]/div/span/a')))
    location.click()

    differentlocation = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="range-modal-mount-node"]/div/div[3]/div/div[2]/div/div[1]/div[2]/a')))
    differentlocation.click()

    searchbar = driver.find_element_by_xpath('//*[@id="change-store-input"]')
# In this part you can choose the location you want to check
    searchbar.send_keys('heerlen')

    heerlen = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="range-modal-mount-node"]/div/div[3]/div/div[2]/div/div[3]/div')))
    heerlen.click()

    selecteer = driver.find_element_by_xpath('//*[@id="range-modal-mount-node"]/div/div[3]/div/div[3]/button')
    selecteer.click()

    close = driver.find_element_by_xpath('//*[@id="range-modal-mount-node"]/div/div[3]/div/div[1]/button')
    close.click()

# After you went to the right page, beautifulsoup it
    source = driver.page_source

    soup = BeautifulSoup(source, 'lxml')

# Locate the part where the availability stuff is
    availabilitypanel = soup.find('div', {"class" : "range-revamp-product-availability"})

# Get the text of the things inside of that panel
    availabilitysectiontext = [part.getText() for part in availabilitypanel]

# Check whether it is still out of stock, if so wait half an hour and continue
    if 'Niet op voorraad in Heerlen' in availabilitysectiontext:
        time.sleep(1800)
        continue

# If not, send me an email that it is back in stock
    else:
        print('Email is being sent...')
        context = ssl.create_default_context()
        with smtplib.SMTP_SSL('smtp.gmail.com', port, context=context) as server:
            server.login(sender_email, password)
            server.sendmail(sender_email, receiver_email, message)
        break

score 0 · Accepted Answer · answered Jun 12 '20 at 17:18

The page markup is getting added with javascript after the initial server response. BeautifulSoup is only able to see the initial response and doesn't execute javascript to get the complete response. If you want to run JavaScript, you'll need to use a headless browser. Otherwise, you'll have to disassemble the JavaScript and see what it does.

You could get this to work with Selenium. I modified your code a bit and got it to work.

Get Selenium:

pip3 install selenium

Download Firefox + geckodriver or Chrome + chromedriver:

from bs4 import BeautifulSoup
import time
from selenium import webdriver

# Get the url of the IKEA page and set up the bs4 stuff
url = 'https://www.ikea.com/nl/nl/p/flintan-bureaustoel-vissle-zwart-20336841/'

#uncomment the following line if using firefox + geckodriver
#driver = webdriver.Firefox(executable_path='/Users/ralwar/Downloads/geckodriver') # Downloaded from https://github.com/mozilla/geckodriver/releases

# using chrome + chromedriver
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op, executable_path='/Users/ralwar/Downloads/chromedriver') # Downloaded from https://chromedriver.chromium.org/downloads

driver.get(url)
time.sleep(5)   #adding delay to finish loading the page + javascript completely, you can adjust this
source = driver.page_source

soup = BeautifulSoup(source, 'lxml')

# Locate the part where the availability stuff is
availabilitypanel = soup.find('div', {"class" : "range-revamp-product-availability"})

# Get the text of the things inside of that panel
availabilitysectiontext = [part.getText() for part in availabilitypanel]
print(availabilitysectiontext)

The above code prints:

['Niet beschikbaar voor levering', 'Alleen beschikbaar in de winkel', 'Niet op voorraad in Amersfoort', 'Je hebt 365 dagen om van gedachten te veranderen. ']

I thought that was still wrong, because it shouldn't have said 'alleen beschikbaar in de winkel' (only available in shop) if it was still out of stock. But the item was randomly back in stock somehow so it was correct I think hahaha, although 'niet beschikbaar voor levering' (not available for delivery) should't have been there then. I'll try some things with Selenium. Thanks for the answer! — Jem, Jun 13 '20 at 09:41
It outputs the same result as what I see on their [website](https://imgur.com/a/ra6i0ak). Its now showing: `['Beschikbaar voor levering', 'Alleen beschikbaar in de winkel', 'Niet op voorraad in Amersfoort', 'Je hebt 365 dagen om van gedachten te veranderen. ']` — Rajesh, Jun 13 '20 at 16:54
I started all over with selenium because I needed to change the location from Amersfoort to Heerlen, the script I have works as well but in a different way. I do have some questions about your script tho: 1. how come it doesn't open a chrome tab when I run it? 2. What do you do with the 'op' stuff at the beginning; what does it do? 3. So if you first use selenium and then use BeautifulSoup on the selenium, the javascript will be taken into account? — Jem, Jun 13 '20 at 18:40
If you want to open a chrome tab then you can remove the `op.add_argument('headless')`. The 'headless' option makes chome run without the full browser UI - which is often desirable when you are only interested in information rather than the UI, other benefits include that chrome will execute the JavaScript similar to what info you use on a browser. BeautifulSoup doesn't handle javascript so you make use of either Chrome or Firefox to give you a webpage that fully renders the JavaScript. — Rajesh, Jun 13 '20 at 19:44

Why does BeautifulSoup give me the wrong text?

2 Answers2