So i started learning web scraping in python using urllib and bs4,
I was searching for a code to analyze and i found this:- https://stackoverflow.com/a/38620894/14252018 here is the code:-
from urllib.parse import urlencode, urlparse, parse_qs
from lxml.html import fromstring
from requests import get
raw = get("https://www.google.com/search?q=StackOverflow").text
page = fromstring(raw)
for result in page.cssselect(".r a"):
url = result.get("href")
if url.startswith("/url?"):
url = parse_qs(urlparse(url).query)['q']
print(url[0])
When i try to run this it does not print anything
So then i tried using bs4 and this time i chose https://www.duckduckgo.com
and changed the code to this:-
import bs4 as bs
import urllib.request
sauce = urllib.request.urlopen('https://duckduckgo.com/?q=dinosaur&t=h_&ia=web').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup.get_text())
I got an error:-
- Why didn't the first block of code run?
- why did the second block of code gave me an error? and what does that error mean?