1

I am trying to create a script to download an ebook into a pdf. When I try to use beautifulsoup in it I to print the contents of a single page, I get a message in the console stating "Oh no! It looks like JavaScript is disabled in your browser. Please re-enable to access the reader." I have already enabled Javascript in Chrome and this same piece of code works for a page like a stackO answer page. What could be blocking Javascript in this page and how can I bypass it?

My code for reference:

url = requests.get("https://platform.virdocs.com/r/s/0/doc/350551/sp/14552484/mi/47443495/?cfi=%2F4%2F2%5BP7001013978000000000000000003FF2%5D%2F2%2F2%5BP7001013978000000000000000010019%5D%2F2%2C%2F1%3A0%2C%2F1%3A0")
url.raise_for_status()
soup = bs4.BeautifulSoup(url.text, "html.parser")
elems = soup.select("p")

print(elems[0].getText())

1 Answers1

1

The problem is that the page actually contains no content. To load the content it needs to run some JS code. The requests.get method does not run JS, it just loads the basic HTML.

What you need to do is to emulate a browser, i.e. 'open' the page, run JS, and then scrape content. One way to do it is to use a browser driver as described here - https://stackoverflow.com/a/57912823/9805867

IvanD
  • 2,728
  • 14
  • 26