1

I'm trying to crawl Booking.com site for reviews and hotel details. I managed to get the hotels details but when it comes to crawling reviews something weird happens !

I find the container that covers the reviews, but empty...

I made sure the elements I'm looking for are present by inspecting the page using Chrome DevTools

I even switched from using scrapy_splash to selenium in case the former may miss out any dynamic content, I also tried crawling it using BeautifulSoup and Xpath.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

DRIVER_PATH = './chromedriver'
chrome_options = Options()
# chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options, executable_path=DRIVER_PATH)
driver.get("https://www.booking.com/hotel/tn/carlton-tunis.ar.html?label=gen173nr-1FCAEoggI46AdIM1gEaECIAQGYAQG4ARnIAQzYAQHoAQH4ARCIAgGoAgO4Ar351vgFwAIB0gIkNDUyNmFhZGQtODNkMy00Nzg1LWI3MzYtNWE4MzA5Y2RjY2Jk2AIG4AIB;dest_id=-731701;dest_type=city;dist=0;from_beach_non_key_ufi_sr=1;group_adults=2;group_children=0;hapos=1;hpos=1;no_rooms=1;room1=A%2CA;sb_price_type=total;sr_order=popularity;srepoch=1595260105;srpvid=1d9f6f249e3001d7;type=total;ucfs=1&#tab-reviews")
reviewsContainer = driver.find_element_by_xpath("//div[@id='review_list_page_container']/ul[@class='review_list']")

and as I said I get the error of not finding the specified element

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@id='review_list_page_container']/ul[@class='review_list']"}
  (Session info: headless chrome=84.0.4147.89)

any help please ? thanks in advance!

Raki Lachraf
  • 99
  • 1
  • 8

2 Answers2

0

The problem is quite simple.

The reviews tab is hidden and appears only when the page is load (I'm not good in web and I don't know how they call this technology).

So, when you have the option --headless which runs the browser in hidden mode (without loading the UI elements), that hidden tab will not be loaded; the page should be loaded to create it.

The solution is only to disable the headless option. I changed your code to use Firefox browser (sorry I don't have Chrome driver :D ):

import os, platform
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as FirefoxOptions

options = FirefoxOptions()
options.headless = False
driver = webdriver.Firefox(options=options, executable_path=DRIVER_PATH)
driver.get("https://www.booking.com/hotel/tn/carlton-tunis.ar.html?label=gen173nr-1FCAEoggI46AdIM1gEaECIAQGYAQG4ARnIAQzYAQHoAQH4ARCIAgGoAgO4Ar351vgFwAIB0gIkNDUyNmFhZGQtODNkMy00Nzg1LWI3MzYtNWE4MzA5Y2RjY2Jk2AIG4AIB;dest_id=-731701;dest_type=city;dist=0;from_beach_non_key_ufi_sr=1;group_adults=2;group_children=0;hapos=1;hpos=1;no_rooms=1;room1=A%2CA;sb_price_type=total;sr_order=popularity;srepoch=1595260105;srpvid=1d9f6f249e3001d7;type=total;ucfs=1&#tab-reviews")
reviewsContainer = driver.find_element_by_xpath("//div[@id='review_list_page_container']/ul[@class='review_list']")

if you change options.headless to True, you will have the error.

Minions
  • 5,104
  • 5
  • 50
  • 91
  • this is the new error I get when I use your code as it is ! selenium.common.exceptions.WebDriverException: Message: invalid argument: can't kill an exited process – Raki Lachraf Jul 21 '20 at 11:14
  • @RakiLachraf .. you have active instances hidden. Try to kill them using the task manager. If you're not able to find them, restart you PC. – Minions Jul 21 '20 at 11:33
  • I have Fedora distro, and I killed all Firefox processes using "killall firefox"... still getting the same error. – Raki Lachraf Jul 21 '20 at 11:41
  • Could you try to restart it? or to try the code on another machine? – Minions Jul 21 '20 at 11:50
  • BTW, someone mentioned in this question: https://stackoverflow.com/questions/52534658/webdriverexception-message-invalid-argument-cant-kill-an-exited-process-with , that you need to have specific versions for firefox and the driver, not a random combination. – Minions Jul 21 '20 at 11:52
  • yep, saw that and I made sure they are compatible... – Raki Lachraf Jul 21 '20 at 11:53
0

I think I solved the problem:

I'm using Fedora 32 Distro, Selenium being installed in the environment only is not enough. I had to install it using SuperUser priviliges:

sudo pip install -U selenium

then I made sure my drivers directory is in PATH, so I made one in Home directory, moved Gecko and Chrome drivers there and added it to PATH:

export BROWSER_DRIVERS="~/browser_drivers"
export PATH=$PATH:$BROWSER_DRIVERS

all working good now: headful, headless whatever...thank you for help.

Raki Lachraf
  • 99
  • 1
  • 8