I'm trying to scrape the 'activity' text box from the two pages here and here.
I wrote the base of the code:
options = Options()
options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
options.add_experimental_option('excludeSwitches', ['enable-logging'])
#options.add_argument("--headless")
driver = webdriver.Chrome(options=options,executable_path='/mnt/c/Users/kela/Desktop/selenium/chromedriver.exe
url = 'http://www.uwm.edu.pl/biochemia/biopep/peptide_data_page1.php?zm_ID=' + str(i) #where str(i) is either 2500 or 2700 in this example
driver.get(url)
header = driver.find_element_by_css_selector('[name="activity"]')
children = header.find_elements_by_xpath(".//*")
I have two issues:
- I need to only pull out the activity item that is 'option selected value', i don't want ALL the activities returned.
- BUT if the option is the first item in the list, as is the case with one of the pages shown here whose activity is 'aami'; 'selected value' is not an option as it's the default.
So I'm stuck on identifying a line or two of code that I could add to my script that would extract:
neuropeptide | ne
alpha-amylase inhibitor | aami
from these two web pages, if anyone could help.