How to access the specific elements using Python Selenium?

Question

I have a simple selenium python application that I'm attempting to web scrape the categories, which are links. The problem I'm having is getting the links on the left pane to come through as a list using xpath. Additionally, I'd like to capture the line class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context, but I'm not sure where to start with that since it doesn't display in the html or chrome dev tools.

I'm pulling data from the following website:

https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm

My current code is the uncommented code that is working:

from selenium import webdriver
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.proxy import Proxy, ProxyType
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

#service = Service('C:\Program Files\Chrome Driver\chromedriver.exe')
URL = "https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm"
driver = webdriver.Chrome('C:\Program Files\Chrome Driver\chromedriver.exe')
driver.get(URL)


category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)


#WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'tr.dbsearch')))
#pd.read_html(driver.page_source)[1].iloc[:,:-1].to_csv('table.csv',index=False)
#time.sleep(8)
#driver.quit()

Additionally, I've been trying to get the content on the page which is shown as:

class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context

How can I access that text? Everything I tried gives the no such element or no such class name as the error. The main problem is I'm not sure how to find the name of these elements or classes in the javascript if it doesn't exist in the HTML or elements in chrome dev tools?

The error message that I'm getting on using the following is:

category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)

TypeError: find_elements_by_class_name() takes 2 positional arguments but 3 were given

What details are you trying to get? Also, I can't see in your code any attempt to get these details or the strong texts on the left with selenium... — Prophet, Jan 03 '22 at 20:51
The attempts changed many times throughout the Edits. Based on the answer below, that brought me in the right area, but still having issues trying to find this line on the page's element or class. class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context — user1470034, Jan 04 '22 at 14:08
If you are still having issues with the answer, then you should NOT have marked it as answer. If there is an **accepted** answer, then most users will not try to improve it, and this question/answer only serves as a purpose for others in case they have the same problem.... — Luuk, Jan 04 '22 at 14:48
The answer is correct for the first part of it, this should of been posed as two different questions. My apology. — user1470034, Jan 04 '22 at 14:57
You do not have to apologize, I am just giving a hint that you might not get a response.... (on the additional part of the question) — Luuk, Jan 04 '22 at 14:58

score 1 · Answer 1 · answered Jan 03 '22 at 20:11

1

It seems you are looking for the strong tag while all the links on the left are elements. Meaning you are not going to find them with strong.

Basically you are looking for this xpath to get any link:

//div[@class='service drug_class']//a[text()='Any link text here']

Replace the Any link text here with the exact link text.

answered Jan 03 '22 at 20:11

Anand

1,899
1
13
23

If I want to get the entire list of names on the left and not a particular link, how would I go about that? As far as accessing this element? class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context. Also, how did you find the class name service drug_class? – user1470034 Jan 03 '22 at 20:50
You can get all the links with this: //div[@class='service drug_class']//a - Also how I found the class name service drug_class is by looking up the div containing all the links – Anand Jan 03 '22 at 20:54
Ok, thank you! You put me on the right track. If the class name or element isn't in a div tag, but is hidden in the javascript such as the line on the page: class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context – user1470034 Jan 03 '22 at 21:17

score 1 · Accepted Answer · answered Jan 03 '22 at 21:27

To print the strong text names for the links on the left side of the page you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

driver.get("https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm")
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.drug_class img +a")))])

Console Output:

['Anatomical Therapeutic Chemical (ATC1-4)', 'ALIMENTARY TRACT AND METABOLISM (397)', 'ANABOLIC AGENTS FOR SYSTEMIC USE (9)', 'ANTIDIARRHEALS, INTESTINAL ANTIINFLAMMATORY/ANTIINFECTIVE AGENTS (44)', 'ANTIEMETICS AND ANTINAUSEANTS (13)', 'ANTIOBESITY PREPARATIONS, EXCL. DIET PRODUCTS (12)', 'BILE AND LIVER THERAPY (13)', 'DIGESTIVES, INCL. ENZYMES (7)', 'DRUGS FOR ACID RELATED DISORDERS (35)', 'DRUGS FOR CONSTIPATION (39)', 'DRUGS FOR FUNCTIONAL GASTROINTESTINAL DISORDERS (47)', 'DRUGS USED IN DIABETES (69)', 'MINERAL SUPPLEMENTS (30)', 'OTHER ALIMENTARY TRACT AND METABOLISM PRODUCTS (41)', 'STOMATOLOGICAL PREPARATIONS (31)', 'TONICS (0)', 'VITAMINS (23)', 'BLOOD AND BLOOD FORMING ORGANS (158)', 'CARDIOVASCULAR SYSTEM (326)', 'DERMATOLOGICALS (242)', 'GENITO URINARY SYSTEM AND SEX HORMONES (160)', 'SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (66)', 'ANTIINFECTIVES FOR SYSTEMIC USE (334)', 'ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS (324)', 'MUSCULO-SKELETAL SYSTEM (130)', 'NERVOUS SYSTEM (433)', 'ANTIPARASITIC PRODUCTS, INSECTICIDES AND REPELLENTS (77)', 'RESPIRATORY SYSTEM (213)', 'SENSORY ORGANS (174)', 'VARIOUS (137)', 'Established Pharmacologic Classes (EPC) [from DailyMed]', 'MeSH Pharmacologic Actions (MESHPA)', 'Diseases, Life Phases, Behavior Mechanisms and Physiologic States', 'Substances and Cells (CHEM) [from DailyMed]', 'Mechanism of Action (MoA) [from DailyMed]', 'Physiologic Effect (PE) [from DailyMed]', 'Pharmacokinetics (PK)', 'VA Classes (VA)', 'Therapeutic Categories (TC)', 'Disposition (DISPOS) [from SNOMEDCT]', 'Structure (STRUCT) [from SNOMEDCT]', 'CSA Schedule (SCHEDULE)']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

That's exactly what I was missing. I updated the question to provide more clarity. The last part of it is this line on the page: class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context how can I access the entire text? — user1470034, Jan 03 '22 at 21:36

How to access the specific elements using Python Selenium?

2 Answers2