1

I'm trying to webscrape a target website of details such as price, name, jpeg of the product, but what is pulled through python using beautifulsoup doesn't seem to match the html from the target website(using F12).

I've tried using html.parser and lxml within the beautifulsoup function, but both don't seem to make a difference. I've tried googling similar problems, but haven't found anything. I'm using atom to run the python code and am using Ubuntu 18.04.2. I am pretty new at using python, but have coded a bit before.

url = 'https://www.target.com/s?searchTerm=dove'
# Gets html from the given url
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
items = html_soup.find_all('li', class_ = 'bkaxin')
print(len(items))

It's suppose to output 28, but I consistently get 0

Ismael Padilla
  • 5,246
  • 4
  • 23
  • 35
hello1094
  • 13
  • 2

1 Answers1

0

It looks like the elements that you're trying to find aren't there because they are created dynamically after the site loads. You can see that by yourself by looking at the source code when the website first loads. You can also try printing html_soup.prettify() and you'll see that the elements you're trying to find aren't there.

Inspired by this question, I present a solution based on using selenium:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.target.com/s?searchTerm=dove"
driver = webdriver.Firefox()

driver.get(url)
html = driver.page_source
html_soup = BeautifulSoup(html, 'html.parser')
items = html_soup.find_all('li', class_ = 'bkaXIn')
driver.close()

print(len(items))

The previous code outputs 28 when I run it.

Note that you need to install selenium (installation guide here) and the appropriate driver for this to work (in my solution I used the Firefox driver which can be downloaded here).

Also note that I use class_ = 'bkaXIn' (case sensitive!) in html_soup.find_all.

Ismael Padilla
  • 5,246
  • 4
  • 23
  • 35