I'm trying to scrape some informations on the car from leboncoin.
I used jupyter notebook to overcome Datadome. Here's my first cell :
import pandas as pd
import numpy as np
import time
import random
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
PATH = "chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
options.add_argument('enable-logging')
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=PATH)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
url = 'https://www.leboncoin.fr/voitures/offres'
driver.get(url)
Here I swich manually to bypass the robot test and I run that :
cookie = driver.find_element_by_xpath('//*[@id="didomi-notice-agree-button"]')
try:
cookie.click()
except:
pass
time.sleep(2)
car = driver.find_element_by_xpath('//input[@autocomplete="search-keyword-suggestions"]')
car.click()
car.send_keys('Peugeot')
car.send_keys(Keys.ENTER)
time.sleep(3)
and after that, I run this :
next = driver.find_element_by_xpath('//a[@title="Page suivante"]')
for x in range(2):
time.sleep(3)
links = driver.find_elements_by_class_name("styles_adCard__2YFTi")
for l in links:
data = l.text
print(data)
print()
next = driver.find_element_by_xpath('//a[@title="Page suivante"]')
next.click()
time.sleep(3)
Unfortunately I cannot find how to create a proper datafram, and that's because the html strucutre is the same for all the object I wanted, so I cannot separate them distinctly.
I obtain somthing like that :
5
Peugeot 2008 1,6l Blue-HDI 92cv à 7800e (Ann2015/Toit panoramique)
PRO
7 500 €
Année
2015
Kilométrage
100000 km
Carburant
Diesel
Boîte de vitesse
Manuelle
Baie-Mahault 97122
5
PEUGEOT 206 1.4 i 75 CV XR PRESENCE
PRO
3 990 €
Année
2002
Kilométrage
104152 km
Carburant
Essence
Boîte de vitesse
Manuelle
Châtellerault 86100
And I would like some sort of dataframe, something like that :
Usually I can solve that but usually, the html structure distinct each element, here it's all the same so I'm kind of lost.