0

This code used to work for me.

from bs4 import BeautifulSoup
from urllib.request import urlopen

search = 'some_website'
BeautifulSoup(urlopen(search), "lxml")

But now I get the following error.

HTTPError: HTTP Error 403: Forbidden

I can't do a simple request, because I need to scrape javascript information.

hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
print(soup)

I get the following in the soup.

<noscript>Please enable JavaScript to view the page content.</noscript>

How do I get the JavaScript off a web page when I am am getting the "HTTP Error 403: Forbidden" error? Thanks for the help in advance.

I'm using python 3. Please let me know if you need more information.

Andersson
  • 51,635
  • 17
  • 77
  • 129
user3264602
  • 157
  • 2
  • 4
  • 12
  • Please see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/ – Dan-Dev Sep 07 '17 at 18:53
  • `from PyQt5.QtWebKitWidgets import QWebPage` This code does not work in the latest version of PyQt5. Does anyone know if they have an similar alternative to use "QWebPage"? – user3264602 Sep 08 '17 at 22:20

1 Answers1

0

QtWebKit got deprecated upstream in Qt 5.5 and removed in 5.6.

You may want to switch to PyQt5.QtWebEngineWidgets. link

Rajat Soni
  • 600
  • 7
  • 10