This python function aims to scrape a specific identifier (called as PMID) from a JavaScript web-page. When a URL is passed to the function, it gets the page using selenium. The code then tries to find the class "pubmedLink" within tag of html. If found, it returns the extracted PMID to another function.
This works fine, but is literally really slow. Is there a way to accelerate the process may be by using another parser or with a completely different method?
from selenium import webdriver
def _getPMIDfromURL_(url):
driver = webdriver.Chrome('/usr/protoLivingSystematicReviews/drivers/chromedriver')
driver.get(url)
try:
if driver.find_element_by_css_selector('a.pubmedLink').is_displayed():
json_text = driver.find_element_by_css_selector('a.pubmedLink').text
return json_text
except:
return "no_pmid"
driver.quit()
Examples of the URL for the JS web-page,