1

Hi guys i am trying to scrape some data from airbnb in order to create a mini data analysis project for my portfolio. I tried several tutorials with BeautifulSoup but none of them is working today, even if I use the very same link that they are using in the tutorials.

Due to this I turned to Selenium, I achieved to enter the side and I am trying to extract the names for in the first stage. Then I would like to extract all the information (price, reviews, rating, anemities etc.)

My code is the following but I am getting an empty list. Can anyone help me how can i get the name of the appt ?

from selenium import webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
import pandas as pd
from selenium.webdriver.common.by import By
website = 'https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(website)
titles = driver.find_elements("class name", "n1v28t5c s1cjsi4j dir dir-ltr")

Thanks.

Lefteris Kyprianou
  • 219
  • 1
  • 3
  • 14

3 Answers3

1

Selenium with bs4 working fine without any issues and getting the right data. Just run the code.

Example:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
import pandas as pd
import time

url = 'https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)

driver.maximize_window()
time.sleep(5)

soup=BeautifulSoup(driver.page_source, 'lxml')
for card in soup.select('div[class="c4mnd7m dir dir-ltr"]'):
    title = card.select_one('div[class="t1jojoys dir dir-ltr"]').text
    price = card.select_one('span[class="a8jt5op dir dir-ltr"]').text
    link = 'https://www.airbnb.com' + card.select_one('a[class="ln2bl2p dir dir-ltr"]').get('href')
    print(title, price)

Output:

Condo in Thessaloniki $50 per night
Apartment in Thessaloniki $38 per night
Condo in Thessaloniki $80 per night
Apartment in Thessaloniki $66 per night
Condo in Thessaloniki $23 per night
Apartment in Thessaloniki $74 per night
Condo in Thessaloniki $37 per night
Apartment in Thessaloniki $45 per night
Apartment in Thessaloniki $39 per night
Condo in Thessaloniki $27 per night
Apartment in Thessaloniki $28 per night
Condo in Thessaloniki $43 per night
Apartment in Thessaloniki $94 per night
Apartment in Thessaloniki $24 per night
Condo in Thessaloniki $86 per night
Loft in Thessaloniki $23 per night
Apartment in Thessaloníki $45 per night
Apartment in Thessaloniki $44 per night
Condo in Thessaloniki $50 per night
Condo in Thessaloniki $51 per night
Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32
-1

To extract the names of the properties you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id^='title']")))])
    
  • Using XPATH:

    driver.get('https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id, 'title') and text()]")))])
    
  • Console Output:

    ['Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Apartment in Thessaloniki', 'Loft in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Agios Pavlos']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
-1
driver.find_elements("class name", "n1v28t5c s1cjsi4j dir dir-ltr")

Will return 0 elements. By.CLASS_NAME can only find elements based on one class

("n1v28t5c s1cjsi4j dir dir-ltr" is actually 4 separate classes of the element you're trying to locate). You can locate elements with multiple classes using, for example, XPATH selectors.

driver.find_elements(By.XPATH, '//div[@class="n1v28t5c s1cjsi4j dir dir-ltr"]')

This will find all the 20 elements in the page. I strongly encourage you to learn more about XPATH as it's pretty simple to understand and very powerful

  • Is returning again an empty list but maybe i am using a wrong path. I am not sure – Lefteris Kyprianou Sep 02 '22 at 21:38
  • @LefterisKyprianou Hmm, that's weird. Can you post your code after my suggestions somewhere? It works for me – Maciej Miecznik Sep 02 '22 at 21:40
  • website = 'https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown' driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) driver.get(website) titles = driver.find_elements(By.XPATH, '//div[@class="n1v28t5c s1cjsi4j dir dir-ltr"]') – Lefteris Kyprianou Sep 02 '22 at 21:45
  • can you post exactly your code ? – Lefteris Kyprianou Sep 02 '22 at 21:51