Python Webscrape website with javascript

Question

This code used to work for me.

from bs4 import BeautifulSoup
from urllib.request import urlopen

search = 'some_website'
BeautifulSoup(urlopen(search), "lxml")

But now I get the following error.

HTTPError: HTTP Error 403: Forbidden

I can't do a simple request, because I need to scrape javascript information.

hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
print(soup)

I get the following in the soup.

<noscript>Please enable JavaScript to view the page content.</noscript>

How do I get the JavaScript off a web page when I am am getting the "HTTP Error 403: Forbidden" error? Thanks for the help in advance.

I'm using python 3. Please let me know if you need more information.

Please see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/ — Dan-Dev, Sep 07 '17 at 18:53
`from PyQt5.QtWebKitWidgets import QWebPage` This code does not work in the latest version of PyQt5. Does anyone know if they have an similar alternative to use "QWebPage"? — user3264602, Sep 08 '17 at 22:20

score 0 · Answer 1 · answered Sep 09 '17 at 21:22

0

QtWebKit got deprecated upstream in Qt 5.5 and removed in 5.6.

You may want to switch to PyQt5.QtWebEngineWidgets. link

answered Sep 09 '17 at 21:22

Rajat Soni

1 Answers1