Python Beatifulsoup cannot use the result of selenium correctly

Question

I'm trying to parse a webpage using beautifulsoup. I can see that the page is correctly loaded in selenium using chromedriver but the final result is null and when I see the parsed page print in beautifulsoup it does not show the whole page that selenium shows in its automated browser.

The code that I'm using for this purpose is:

page_soup = soup(driver.page_source, "html.parser")
print (page_soup)
containers = page_soup.findAll("div", class_="row ploc-l-row--gutterV flex-wrap flex-align-start flex-center-vertical")
print (len(containers))

I need to access each partner information, but the result is null. The page that I'm working on is

https://locatr.cloudapps.cisco.com/WWChannels/LOCATR/openBasicSearch.do;jsessionid=8CDF9284D014CFF911CB8E6F81812619

Which elements on the page are you trying to find with `page_soup.findAll`? I have run your selector on the page link you provided but it is bringing back no results, so selector is probably wrong. — CEH, Sep 26 '19 at 18:15
As you know, the page is a search page with multiple results. I want to access to each result. For example when you search for china, it will shows 5 results in first page, I want to access to each result. As I can see each result is within a div with a class with some classnames that I mentioned above. — Mahdi, Sep 26 '19 at 18:25
I have written some modified BeautifulSoup code from what you provided, and changed the selector. This code will retrieve the `name` of every partner listed on the search results page. — CEH, Sep 26 '19 at 18:30

score 2 · Accepted Answer · edited May 19 '21 at 09:00

The result is loaded using javascript. You need to wait until the search results to load before scraping. Here is a working example,

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup as soup
import time

url = 'https://locatr.cloudapps.cisco.com/WWChannels/LOCATR/openBasicSearch.do'
driver = webdriver.Chrome(executable_path='C:/Selenium/chromedriver.exe')
driver.get(url)
SearchString = 'CALIFORNIA'
Location = driver.find_element_by_name("location")
Location.send_keys(SearchString)
#search = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located(By.XPATH,"//li//span[contains(text(),'"+SearchString+"')]"))
#search.click()
time.sleep(3)
driver.find_element_by_xpath("//li//span[contains(text(),'"+SearchString+"')]").click()
driver.find_element_by_id("searchBtn").click()

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,'searchResultsList')))
time.sleep(3)
page_soup = soup(driver.page_source, "html.parser")
print(page_soup.prettify())
containers = page_soup.findAll("div", class_="row ploc-l-row--gutterV flex-wrap flex-align-start flex-center-vertical")
print (len(containers))

driver.close()

The results is 5

Thank you Suresh, Your code worked well! Can you help me to get the all results in single page ? I mean all the partners not just first 5. — Mahdi, Sep 26 '19 at 19:17
It would be efficient to get the results using the api. just watch the network tab after the search. There is a getsearch results with all the results. — Sureshmani Kalirajan, Sep 26 '19 at 19:30
FYI it's __scraping__ (and __scrape__, __scraped__, __scraper__) not scrapping, 'Scrapping' means throwing things away like rubbish. — DisappointedByUnaccountableMod, May 19 '21 at 09:00

score 1 · Answer 2 · edited Sep 26 '19 at 22:09

1

Based on your comment clarification, I have something retrieve the Partner Name of every partner displayed in search results:

With BeautifulSoup syntax:

partnerWebElements = page_soup.findAll(title="View Profile")

With just Selenium syntax:

partnerWebElements = driver.find_elements_by_xpath("//a[@title='View Profile']")

You can then get text for each Partner name like this:

for partnerWebElement in partnerWebElements:
    print(partnerWebElement.text);

edited Sep 26 '19 at 22:09

Moshe Slavin

5,127
5
23
38

answered Sep 26 '19 at 18:29

CEH

5,701
2
16
40

I am not sure how the BeautifulSoup syntax works with this, since they do not support XPath. but if you would like to use XPath, you can use `//a[@title='View Profile']`. I am on testing on this page: https://locatr.cloudapps.cisco.com/WWChannels/LOCATR/openBasicSearch.do;jsessionid=8CDF9284D014CFF911CB8E6F81812619 – CEH Sep 26 '19 at 18:40
I updated with another example after checking BeautifulSoup documentation for specific findAll parameters. I will update my answer and add another python example as well, using XPath. – CEH Sep 26 '19 at 18:46
Thank you Christine, But I still does not get your result. Could you please check out my code, I think there is something different between my code and yours. – Mahdi Sep 26 '19 at 19:03
I would refer to @Sureshmani answer that they posted. It looks like a full example and it also retrieves the correct number of results. – CEH Sep 26 '19 at 19:06

score 1 · Answer 3 · answered Sep 27 '19 at 01:17

1

FYI that page uses jQuery which makes this easy:

driver.execute_script("return $('div[class=\"row ploc-l-row--gutterV flex-wrap flex-align-start flex-center-vertical\"]').length")

answered Sep 27 '19 at 01:17

pguardiario

53,827
19
119
159

you can execute jquery as well via execute_script? + – QHarr Sep 27 '19 at 04:27
Yes in this case it was included in the page but you can also [inject it](https://stackoverflow.com/questions/57941221/how-can-i-use-jquery-with-selenium-execute-script-method) when it isn't – pguardiario Sep 27 '19 at 04:29
wow! Thanks for the link. Be interesting to see if this works across languages with selenium. That is so cool and never thought of it but totally makes sense. – QHarr Sep 27 '19 at 04:33
It should, but it helps if the language has heredocs (looking at Java). jQuery is really the gold standard of html parsers which is why I slowly shake my head when people mix selenium with beautiful soup. – pguardiario Sep 27 '19 at 06:09

Python Beatifulsoup cannot use the result of selenium correctly

3 Answers3