0

I found out you can't use javascript in beautifulsoup. I have this code:

from bs4 import BeautifulSoup
import requests
import warnings
import time

warnings.filterwarnings("ignore", category=UserWarning, module='bs4')
url = ["https://google.com"]

# add header
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}
for item in url:
    r = requests.get(item, headers=headers)
    print(r.text)
    time.sleep(2)

I tried running this but I got an error to enable javascript in my browser. My question is if there is any way to scrape links which use javascript in python (with or without beautifulsoup, with preferred)?

PS: My javascript code: <script src="https://linkvertise.net/cdn/linkvertise.js"></script><script>linkvertise(33538, {whitelist: [], blacklist: [""]});</script>

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
AryTuber
  • 49
  • 5
  • Possible duplicate of [Running javascript in Selenium using Python](https://stackoverflow.com/questions/7794087/running-javascript-in-selenium-using-python) – Ofer Sadan Oct 05 '19 at 19:06
  • please ensure your url ties in with your description. How does your _javascript code_ tie in with the code shown above it? – QHarr Oct 06 '19 at 05:18

1 Answers1

1

One option is the lxml library. It won't work in every situation, but it will work in some. If you can convince the page to render the JavaScript, which server side JavaScript should do just fine, you just need to put a time delay in to make sure things load. Parsing may still be a pain in your rear, because of how most JavaScript frameworks handle this sort of thing, but I've scraped JavaScript backend sites just fine with lxml.

The caveat is that if it's client side JavaScript (fully browser rendered), to the best of my knowledge your best option is Selenium, and Selenium is not officially a scraping library, not optimized for such, and requires that you have a fully functional, specific browser, running in headless mode - usually Chrome, which is a pain for Linux development environments. There really needs to be work to improve this, but sadly the data mining community hasn't quite managed it yet.

You need to determine if the JavaScript elements you need are being served server-side or client side. If the former, try lxml and then Selenium. If the latter, I'm afraid you're stuck with Selenium.