I want to use selenium on this page.
The steps I want to take to scrape the page:
1. type '22663' into the box that says 'search by plant-based food'
2. click 'food-disease association
3. click submit on the bottom of the page
4. click 'plant-disease associations'
5. export the plant-disease table
I wrote this code:
import sys
import pandas as pd
from bs4 import BeautifulSoup
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import csv
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
#binary = FirefoxBinary('/Users/kela/Desktop/scripts/scraping/geckodriver')
url = 'http://147.8.185.62/services/NutriChem-2.0/'
driver = webdriver.Firefox(executable_path='/Users/kela/Desktop/scripts/scraping/geckodriver')
driver.get(url)
element = driver.find_element_by_id("input_food_name")
element.send_keys("22663")
#click food-disease association
element = driver.find_element_by_xpath("//select[@name='food_search_section']")
#all_options = element.find_elements_by_tag_name("option")
element = Select(driver.find_element_by_css_selector('[name=food_search_section]'))
element.select_by_value('food_disease')
submit_xpath = '/html/body/form/p[2]/input[1]'
destination_page_link = driver.find_element_by_xpath(submit_xpath)
destination_page_link.click()
#this doesn't work for step 4
#xpath2 = '/html/body/table/tbody/tr/td[3]/div'
#destination_page_link = driver.find_element_by_xpath(xpath2)
#destination_page_link.click()
#this doesn't work for step 4
xpath2 = '/html/body/table/tbody/tr/td[3]/div/span'
destination_page_link = driver.find_element_by_xpath(xpath2)
destination_page_link.click()
I am struggling with steps 4 and 5.
For step 4, how do I select the 'div class -> onclick ClickButton (nutrichem12587_disease.tsv','plant_disease' button? You can see a couple of things I've tried in the above code based on other stackoverflow quesions e.g. here and , I tried a good few things, these are two examples.
Then for step 5, I can already foresee having a similar issue, because I want to click the 'expand/right arrow' for each row (e.g. the arrow beisde pomegranate/diabetes), and print out the data beneath that i.e.
PredictionPMID:22919408 Punica granatum Diabetes
PredictionPMID:22529479 P. granatum Diabetes
PredictionPMID:22529479 Punica granatum Diabetes
PredictionPMID:20020514 Punica granatum Diabetes
for each of the subsequent rows. Could someone show me how to do this.
Edit 1: for step 4, I've tried things like this, but they return errors saying the elements don't exist, even though I got the locations by copying the XPaths:
#click plant-disease associations
#submit_xpath = '/html/body/table/tbody/tr/td[3]/div/span'
submit_xpath = '/html/body/table/tbody/tr/td[3]'
destination_page_link = driver.find_element_by_xpath(submit_xpath)
destination_page_link.click()