I have a simple selenium python application that I'm attempting to web scrape the categories, which are links. The problem I'm having is getting the links on the left pane to come through as a list using xpath. Additionally, I'd like to capture the line class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context, but I'm not sure where to start with that since it doesn't display in the html or chrome dev tools.
I'm pulling data from the following website:
https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm
My current code is the uncommented code that is working:
from selenium import webdriver
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.proxy import Proxy, ProxyType
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
#service = Service('C:\Program Files\Chrome Driver\chromedriver.exe')
URL = "https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm"
driver = webdriver.Chrome('C:\Program Files\Chrome Driver\chromedriver.exe')
driver.get(URL)
category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)
#WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'tr.dbsearch')))
#pd.read_html(driver.page_source)[1].iloc[:,:-1].to_csv('table.csv',index=False)
#time.sleep(8)
#driver.quit()
Additionally, I've been trying to get the content on the page which is shown as:
class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context
How can I access that text? Everything I tried gives the no such element or no such class name as the error. The main problem is I'm not sure how to find the name of these elements or classes in the javascript if it doesn't exist in the HTML or elements in chrome dev tools?
The error message that I'm getting on using the following is:
category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)
TypeError: find_elements_by_class_name() takes 2 positional arguments but 3 were given