How to extract total search results with Selenium Webdriver in Python?

Question

I am trying to extract the search results count from an IEEE Xplore search given the search results URL using the selenium webdriver. I'm not getting any errors from the code below, but I am unsure how to proceed from here.

Website Element of Interest:

Element Inspection Results:

url = 'https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping'
chrome_driver_path = '\\xxxx\chromedriver.exe'
driver.get(url)
wait.until(presence_of_element_located((By.CLASS_NAME, "strong")))
#result = driver.??????
print(result)
driver.close()

Honestly, will be better to try first their API endpoint to get this data. You can open devtools in your browser, Network tab and then find a POST request to the `https://ieeexplore.ieee.org/rest/search` endpoint — dukkee, Jan 25 '21 at 18:14
@dukkee Thank you for the response. I've considered this, but this is in-part a learning exercise for scraping in general for me and since it's for personal use, I don't have a website or company affiliation for their API application form. — ArchMorlock, Jan 26 '21 at 19:50
what do you mean by "their API application form"? Internal API usage is the same scraping as layout scraping. — dukkee, Jan 26 '21 at 20:12

score 1 · Answer 1 · answered Jan 25 '21 at 22:49

To print the number of search results i.e. 184 you can use either of the following Locator Strategies:

Using css_selector and get_attribute("innerHTML"):

driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
print(driver.find_element(By.CSS_SELECTOR, "div.Dashboard-header span span:nth-of-type(2) ").get_attribute("innerHTML"))

Using xpath and text attribute:

driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
print(driver.find_element(By.XPATH, "//div[contains(@class, 'Dashboard-header')]//span//following::span[2]").text)

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and text attribute:

driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.Dashboard-header span span:nth-of-type(2)"))).text)

Using XPATH and get_attribute("innerHTML"):

driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Dashboard-header')]//span//following::span[2]"))).get_attribute("innerHTML"))

Console Output:
```
184
```

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

score 0 · Answer 2 · answered Jan 25 '21 at 18:22

0

As dukkee mentioned check the api, but to answer your question you can select it like:

soup.select('div.Dashboard-header.col-12 > span span')[1].get_text()

Locate a parent div with unique class and then go down to the span.

Example

from selenium import webdriver
from bs4 import BeautifulSoup
import time

url = 'https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping'
driver = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get(url)
time.sleep(3)

html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
print(soup.select('div.Dashboard-header.col-12 > span span')[1].get_text())

driver.quit()

answered Jan 25 '21 at 18:22

HedgeHog

22,146
4
14
36

Thanks for the process explanation. The example worked perfectly for me, though I am sometimes getting an "IndexError: list index out of range" error. It seems that the page doesn't always load, which causes this error. If I try the same search a bit later, it's working fine. – ArchMorlock Jan 26 '21 at 19:59
You can handle this even better with [Waits](https://selenium-python.readthedocs.io/waits.html#explicit-waits) instead of `sleep` - Cause you can define to wait for a certain condition e.g. element is loaded and than proceed in the code. – HedgeHog Jan 26 '21 at 20:09

How to extract total search results with Selenium Webdriver in Python?

2 Answers2

References