0

I want to use selenium on this page.

The steps I want to take to scrape the page:

1. type '22663' into the box that says 'search by plant-based food'
2. click 'food-disease association
3. click submit on the bottom of the page
4. click 'plant-disease associations'
5. export the plant-disease table

I wrote this code:

import sys
import pandas as pd
from bs4 import BeautifulSoup
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import csv
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

#binary = FirefoxBinary('/Users/kela/Desktop/scripts/scraping/geckodriver')
url = 'http://147.8.185.62/services/NutriChem-2.0/'
driver = webdriver.Firefox(executable_path='/Users/kela/Desktop/scripts/scraping/geckodriver')
driver.get(url)

element = driver.find_element_by_id("input_food_name")
element.send_keys("22663")

#click food-disease association
element = driver.find_element_by_xpath("//select[@name='food_search_section']")
#all_options = element.find_elements_by_tag_name("option")

element = Select(driver.find_element_by_css_selector('[name=food_search_section]'))
element.select_by_value('food_disease')

submit_xpath = '/html/body/form/p[2]/input[1]'
destination_page_link = driver.find_element_by_xpath(submit_xpath)
destination_page_link.click()


#this doesn't work for step 4
#xpath2 = '/html/body/table/tbody/tr/td[3]/div'
#destination_page_link = driver.find_element_by_xpath(xpath2)
#destination_page_link.click()

#this doesn't work for step 4
xpath2 = '/html/body/table/tbody/tr/td[3]/div/span'
destination_page_link = driver.find_element_by_xpath(xpath2)
destination_page_link.click()

I am struggling with steps 4 and 5.

For step 4, here how do I select the 'div class -> onclick ClickButton (nutrichem12587_disease.tsv','plant_disease' button? You can see a couple of things I've tried in the above code based on other stackoverflow quesions e.g. here and , I tried a good few things, these are two examples.

Then for step 5, I can already foresee having a similar issue, because I want to click the 'expand/right arrow' for each row (e.g. the arrow beisde pomegranate/diabetes), and print out the data beneath that i.e.

PredictionPMID:22919408 Punica granatum     Diabetes
PredictionPMID:22529479 P. granatum     Diabetes
PredictionPMID:22529479 Punica granatum     Diabetes
PredictionPMID:20020514 Punica granatum     Diabetes

for each of the subsequent rows. Could someone show me how to do this.

Edit 1: for step 4, I've tried things like this, but they return errors saying the elements don't exist, even though I got the locations by copying the XPaths:

#click plant-disease associations
#submit_xpath = '/html/body/table/tbody/tr/td[3]/div/span'
submit_xpath = '/html/body/table/tbody/tr/td[3]'
destination_page_link = driver.find_element_by_xpath(submit_xpath)
destination_page_link.click()
Slowat_Kela
  • 1,377
  • 2
  • 22
  • 60

2 Answers2

0

For step 4

If you're confident the web page will be the exact same every time, you could identify an element which contains your "plant-disease associations" button and then manually click (x, y) coordinates within that element. as described as the second answer here

For step 5

Try to scoop the entire table first as opposed to the individual right arrows and go over it manually by identifying all the children.

Derek Fulton
  • 306
  • 1
  • 14
  • thanks for the reply I appreciate, unfortunately i can't be 100% confident that it's the exact same page each time (I guess in theory it should be, but it seems a not robust method/very open to going wrong?) – Slowat_Kela Jul 21 '19 at 13:45
0

For step 4, it's possible that your code is not working because it's not waiting for the page to load. If this is the case, adding these import statements:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

And I have this function that I find really handy for browser automation that you can add the your script:

def wait_for_element(driver, selector, method):
    """Returns element after waiting for page load"""
try:
    wait = WebDriverWait(driver, 10)
    wait.until(
        eval(f'EC.presence_of_element_located((By.{method}, "{selector}"))')
    )
finally:
    element = eval(f'driver.find_element_by_{method.lower()}("{selector}")')

    return element

Implement it to find the button for step 4 by using:

xpath2 = '/html/body/table/tbody/tr/td[3]/div'
destination_page_link = wait_for_element(driver, xpath2, 'XPATH')

Hope this helps!

lol cubes
  • 125
  • 7
  • Thanks so much, unfortunately I get errors with it 'selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of [object String] "1d0c2161-2a4d-0a40-b8bd-5c8fbe0e3061" is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed' but I guess I should try fix my original error first, but I appreciate this. – Slowat_Kela Jul 21 '19 at 21:16
  • Try using an implicit wait (`time.sleep(2)`) instead of using my function. – lol cubes Jul 21 '19 at 23:23