0

I try to scrape email to scrape email but it give me none. these is page link: https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep

headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
base_url='https://www.avocats-lille.com/'
url = 'https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille?view=entries'
driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
tra = soup.find_all('h2',class_='title')
productlinks=[]
for links in tra:
    for link in links.find_all('a',href=True):
        comp=base_url+link['href']
        productlinks.append(comp)
        
for link in productlinks:
    r =requests.get(link,headers=headers)
    soup=BeautifulSoup(r.content, 'html.parser')
    sleep(5)
    details=soup.find_all("div",class_="item col-5")
    for detail in details:
        email=soup.find('a[href^="mailto"]')
        print(email)
James Z
  • 12,209
  • 10
  • 24
  • 44
Amen Aziz
  • 769
  • 2
  • 13

2 Answers2

0

Links you are looking for are not inside the tra (title) elements.
You should change the code as following to make it working:

tra = soup.find_all('div',class_='item')
Prophet
  • 32,350
  • 22
  • 54
  • 79
  • they again give me none – Amen Aziz Aug 16 '22 at 21:28
  • That's strange. I'm not familiar with bs4 syntax, but I see the blocks containing the data of each advocate is matching this CSS Selector: `div.item` Inside these blocks there are links matching `a[href]` CSS Selector locator. Or in XPath way `//div[@class='item']` and `//a[@href]` respectively – Prophet Aug 16 '22 at 21:40
0

The Email address is within the following element:

<a href="mailto:kamelabbas2002@yahoo.fr">kamelabbas2002@yahoo.fr</a>

Solution

Using Selenium to print the Email address i.e. the innertext attribute you can use either of the following locator strategies:

  • Using css_selector:

    driver.execute("get", {'url': 'https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry'})
    print(driver.find_element("css selector", 'a[href^="mailto"]').text)
    
  • Using xpath:

    driver.execute("get", {'url': 'https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry'})
    print(driver.find_element("xpath", '//a[starts-with(@href, "mailto")]').text)
    
  • Console Output:

    kamelabbas2002@yahoo.fr
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352