I've been wanting to parse information from a particular website, and I have been having problems with the dynamic aspect. When a request is called in python for this site with BeautifulSoup, etc., everything in < div id="root" > isn't there.
According to the answer to this similar question -- Why isn't the html code inside div is being parsed? -- I tried to use a headless browser. I ended up trying to use selenium and splinter with the '--headless' options enabled for chrome.
I don't know whether the headless browser I chose is just the wrong one for this particular website's setup, or if its my code, so please give me suggestions if you have any.
Notes: Running on Ubunutu 20.04.1 LTS, and Python 3.8.3. If you want to suggest different headless browser prgorams, go ahead, but it needs to be compatible for all linux, mac, etc. and Python.
Below is a look at my most recent code. I've tried various ways to ".find" the button I want to click. Here I tried to use the xpath of the element I want, which I got through inspect:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--ignore-certificate-errors')
with Browser('chrome', options=options) as browser:
browser.visit("http://gnomad.broadinstitute.org/region/16-2087388-2087428?dataset=gnomad_r2_1")
print(browser.title)
browser.find_by_xpath('//*[@id="root"]/div/div/div[2]/div/div[3]/section/div[2]/button').first.click()
The error message this gave me was:
File "etc/anaconda3/lib/python3.8/site-packages/splinter/element_list.py", line 42, in __getitem__
return self._container[index]
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "practice3.py", line 20, in
browser.find_by_xpath('//[@id="root"]/div/div/div[2]/div/div[3]/section/div[2]/button').first.click()
File "etc/anaconda3/lib/python3.8/site-packages/splinter/element_list.py", line 57, in first
return self[0]
File "etc/anaconda3/lib/python3.8/site-packages/splinter/element_list.py", line 44, in getitem
raise ElementDoesNotExist(
splinter.exceptions.ElementDoesNotExist: no elements could be found with xpath "//
[@id="root"]/div/div/div[2]/div/div[3]/section/div[2]/button"
Thanks!