-2

I'm trying to learn how to scrape a website and I keep bumping into urllib.request, which doesn't work for me.

import urllib.request
import bs4 as bs
sauce = urllib.request.urlopen('https://www.goat.com/collections/just-dropped').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)
0xInfection
  • 2,676
  • 1
  • 19
  • 34

3 Answers3

1

Try requests

import requests
import bs4 as bs
sauce = requests.get('https://www.goat.com/collections/just-dropped').text
soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)
chitown88
  • 27,527
  • 4
  • 30
  • 59
0

You have to to set User-Agent header, but unfortunately the page is dynamic content and you have to Use selenium

from urllib.request import Request, urlopen
import bs4 as bs

req = Request('https://www.goat.com/collections/just-dropped')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0')
sauce = urlopen(req).read()

soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)

using Selenium, to use it you need to install, Selenium, Chrome and chromedriver

pip install selenium
pip install chromedriver-binary

the code

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import chromedriver_binary  # Adds chromedriver binary to path

driver = webdriver.Chrome()
driver.get('https://www.goat.com/collections/just-dropped')

# wait until the product rendered
products = WebDriverWait(driver, 15).until(
    lambda d: d.find_element_by_css_selector('.goat-clean-product-template ')
)

for p in products:
    name = p.get_attribute('title')
    url = p.get_attribute('href')
    print('%s: %s' % (name, url))
cieunteung
  • 1,725
  • 13
  • 16
0

As said before, you can use the requests library really really to fetch a page content.

First of all you have to install requests and bs4 via pip. This will resolve the ModuleNotFoundError you are getting.

pip install bs4
pip install requests

Then he is your code for getting data:

import requests 
from bs4 import BeautifulSoup
sauce = requests.get('https://www.goat.com/collections/just-dropped')
soup = BeautifulSoup(sauce.text, 'lxml')
print(soup)
0xInfection
  • 2,676
  • 1
  • 19
  • 34