1

I want to webscrape the data of a graph that can be found on this webpage. For this purpose, I am using Selenium in Python (Pycharm) . So far this is my code:

from selenium import webdriver
mozilla_path = r"C:\Users\ivrav\Python38\geckodriver.exe"
driver = webdriver.Firefox()
driver.get("https://scholar.google.com/citations?user=8Cuk5vYAAAAJ&hl=en")
driver.maximize_window()
Researcher=driver.find_element_by_xpath("""//*[@id="gsc_rsb_cit"]/div/div[3]/div""") .click()
Graph=driver.find_elements_by_id("gsc_md_hist_b")
print(Graph.text)

The code works fine until it has to take the information (years and citations per year) from the graph, the reply is that there is no text to scrape. Could you give me some ideas of how can I scrape the information I need?

Many thanks in advance, Iván

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Iván
  • 63
  • 5
  • You could also be looking directly for ``'s of class `.gsc_g_t` for the years, while the citation counts are in ` `. – Asmus Jul 20 '20 at 07:44

2 Answers2

0

You can try by using xpath with class attribute and fetching all span test as list. Please check below untested code:

from selenium import webdriver
mozilla_path = r"C:\Users\ivrav\Python38\geckodriver.exe"
driver = webdriver.Firefox()
driver.get("https://scholar.google.com/citations?user=8Cuk5vYAAAAJ&hl=en")
driver.maximize_window()
Researcher=driver.find_element_by_xpath("""//*[@id="gsc_rsb_cit"]/div/div[3]/div""") .click()
#Graph=driver.find_elements_by_id("gsc_md_hist_b")
#Graph=driver.find_elements_by_xpath('//div[@class=".gsc_md_hist_b"]//span[@class=".gsc_g_t"]')
Graph=driver.find_elements_by_xpath("//span[@class='gsc_g_t']")

for spanText in Graph:
    print(spanText.text)

BarValue=driver.find_elements_by_xpath("//span[@class='gsc_g_al']")
for barValueText in BarValue:
        print(barValueText.text)
Ashish Karn
  • 1,127
  • 1
  • 9
  • 20
  • Many thanks, Ashish Karn! Do you know how can I scrape the information on the bars (number of citations)? I am having struggles scraping this info. Many thanks in advance, Iván – Iván Jul 20 '20 at 11:58
0

To extract the information of the years you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using XPATH:

    driver.get("https://scholar.google.com/citations?user=8Cuk5vYAAAAJ&hl=en")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@id='gsc_rsb_cit']//div[@class='gsc_md_hist_w']/div[@class='gsc_md_hist_b']"))).click()
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@id='gsc_md_hist_c']//div[@class='gsc_md_hist_w']/div[@class='gsc_md_hist_b']//span[@class='gsc_g_t']")))])
    
  • Console Output:

    ['2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Many thanks @DebanjanB! Actually, I scraped the years, but I am having problems with scraping the information of the bars (number of citations). Do you have some advice to achieve this? Again, many thanks Iván – Iván Jul 20 '20 at 11:56
  • @Iván This answer was constructed as per your code trials.Yes, I do have a solution for _number of citations_ as well but I'm afraid, for that you have to raise a new ticket along with your code trials. Hint: You have to _mouse hover_. – undetected Selenium Jul 20 '20 at 12:00