0

I'm trying to parse a webpage using beautifulsoup. I can see that the page is correctly loaded in selenium using chromedriver but the final result is null and when I see the parsed page print in beautifulsoup it does not show the whole page that selenium shows in its automated browser.

The code that I'm using for this purpose is:

page_soup = soup(driver.page_source, "html.parser")
print (page_soup)
containers = page_soup.findAll("div", class_="row ploc-l-row--gutterV flex-wrap flex-align-start flex-center-vertical")
print (len(containers))

I need to access each partner information, but the result is null. The page that I'm working on is

https://locatr.cloudapps.cisco.com/WWChannels/LOCATR/openBasicSearch.do;jsessionid=8CDF9284D014CFF911CB8E6F81812619
Mahdi
  • 967
  • 4
  • 18
  • 34
  • Which elements on the page are you trying to find with `page_soup.findAll`? I have run your selector on the page link you provided but it is bringing back no results, so selector is probably wrong. – CEH Sep 26 '19 at 18:15
  • As you know, the page is a search page with multiple results. I want to access to each result. For example when you search for china, it will shows 5 results in first page, I want to access to each result. As I can see each result is within a div with a class with some classnames that I mentioned above. – Mahdi Sep 26 '19 at 18:25
  • I have written some modified BeautifulSoup code from what you provided, and changed the selector. This code will retrieve the `name` of every partner listed on the search results page. – CEH Sep 26 '19 at 18:30
  • I get no results found when loading that link. – QHarr Sep 26 '19 at 19:03

3 Answers3

2

The result is loaded using javascript. You need to wait until the search results to load before scraping. Here is a working example,

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup as soup
import time

url = 'https://locatr.cloudapps.cisco.com/WWChannels/LOCATR/openBasicSearch.do'
driver = webdriver.Chrome(executable_path='C:/Selenium/chromedriver.exe')
driver.get(url)
SearchString = 'CALIFORNIA'
Location = driver.find_element_by_name("location")
Location.send_keys(SearchString)
#search = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located(By.XPATH,"//li//span[contains(text(),'"+SearchString+"')]"))
#search.click()
time.sleep(3)
driver.find_element_by_xpath("//li//span[contains(text(),'"+SearchString+"')]").click()
driver.find_element_by_id("searchBtn").click()

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,'searchResultsList')))
time.sleep(3)
page_soup = soup(driver.page_source, "html.parser")
print(page_soup.prettify())
containers = page_soup.findAll("div", class_="row ploc-l-row--gutterV flex-wrap flex-align-start flex-center-vertical")
print (len(containers))

driver.close()

The results is 5

Sureshmani Kalirajan
  • 1,938
  • 2
  • 9
  • 18
1

Based on your comment clarification, I have something retrieve the Partner Name of every partner displayed in search results:

With BeautifulSoup syntax:

partnerWebElements = page_soup.findAll(title="View Profile")

With just Selenium syntax:

partnerWebElements = driver.find_elements_by_xpath("//a[@title='View Profile']")

You can then get text for each Partner name like this:

for partnerWebElement in partnerWebElements:
    print(partnerWebElement.text);
Moshe Slavin
  • 5,127
  • 5
  • 23
  • 38
CEH
  • 5,701
  • 2
  • 16
  • 40
  • I am not sure how the BeautifulSoup syntax works with this, since they do not support XPath. but if you would like to use XPath, you can use `//a[@title='View Profile']`. I am on testing on this page: https://locatr.cloudapps.cisco.com/WWChannels/LOCATR/openBasicSearch.do;jsessionid=8CDF9284D014CFF911CB8E6F81812619 – CEH Sep 26 '19 at 18:40
  • I updated with another example after checking BeautifulSoup documentation for specific findAll parameters. I will update my answer and add another python example as well, using XPath. – CEH Sep 26 '19 at 18:46
  • Thank you Christine, But I still does not get your result. Could you please check out my code, I think there is something different between my code and yours. – Mahdi Sep 26 '19 at 19:03
  • I would refer to @Sureshmani answer that they posted. It looks like a full example and it also retrieves the correct number of results. – CEH Sep 26 '19 at 19:06
1

FYI that page uses jQuery which makes this easy:

driver.execute_script("return $('div[class=\"row ploc-l-row--gutterV flex-wrap flex-align-start flex-center-vertical\"]').length")
pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • you can execute jquery as well via execute_script? + – QHarr Sep 27 '19 at 04:27
  • Yes in this case it was included in the page but you can also [inject it](https://stackoverflow.com/questions/57941221/how-can-i-use-jquery-with-selenium-execute-script-method) when it isn't – pguardiario Sep 27 '19 at 04:29
  • wow! Thanks for the link. Be interesting to see if this works across languages with selenium. That is so cool and never thought of it but totally makes sense. – QHarr Sep 27 '19 at 04:33
  • It should, but it helps if the language has heredocs (looking at Java). jQuery is really the gold standard of html parsers which is why I slowly shake my head when people mix selenium with beautiful soup. – pguardiario Sep 27 '19 at 06:09