Urllib.request doesn't work on python 3. How can I use beautifulsoup?

Question

I'm trying to learn how to scrape a website and I keep bumping into urllib.request, which doesn't work for me.

import urllib.request
import bs4 as bs
sauce = urllib.request.urlopen('https://www.goat.com/collections/just-dropped').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)

score 1 · Answer 1 · answered Jan 22 '19 at 20:53

1

Try requests

import requests
import bs4 as bs
sauce = requests.get('https://www.goat.com/collections/just-dropped').text
soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)

answered Jan 22 '19 at 20:53

chitown88

27,527
4
30
59

ModuleNotFoundError: No module named 'requests' – Tudor Popica Jan 22 '19 at 22:05
1

click on the link I left in the solution. You need to install it – chitown88 Jan 23 '19 at 08:22

score 0 · Answer 2 · answered Jan 23 '19 at 00:46

You have to to set User-Agent header, but unfortunately the page is dynamic content and you have to Use selenium

from urllib.request import Request, urlopen
import bs4 as bs

req = Request('https://www.goat.com/collections/just-dropped')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0')
sauce = urlopen(req).read()

soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)

using Selenium, to use it you need to install, Selenium, Chrome and chromedriver

pip install selenium
pip install chromedriver-binary

the code

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import chromedriver_binary  # Adds chromedriver binary to path

driver = webdriver.Chrome()
driver.get('https://www.goat.com/collections/just-dropped')

# wait until the product rendered
products = WebDriverWait(driver, 15).until(
    lambda d: d.find_element_by_css_selector('.goat-clean-product-template ')
)

for p in products:
    name = p.get_attribute('title')
    url = p.get_attribute('href')
    print('%s: %s' % (name, url))

score 0 · Answer 3 · answered Jan 23 '19 at 01:45

0

As said before, you can use the requests library really really to fetch a page content.

First of all you have to install requests and bs4 via pip. This will resolve the ModuleNotFoundError you are getting.

pip install bs4
pip install requests

Then he is your code for getting data:

import requests 
from bs4 import BeautifulSoup
sauce = requests.get('https://www.goat.com/collections/just-dropped')
soup = BeautifulSoup(sauce.text, 'lxml')
print(soup)

answered Jan 23 '19 at 01:45

0xInfection

2,676
1
19
34

you are a lovely human being thank you – Tudor Popica Jan 23 '19 at 20:37
@TudorPopica, glad it helped you. – 0xInfection Jan 25 '19 at 13:26

Urllib.request doesn't work on python 3. How can I use beautifulsoup?

3 Answers3