1

https://insuretechconnect.com/speakers/

Greetings, I want to extract the speakers' information from the above website and I want to have their Name, Title, Company, img src link, and Description.

enter image description here

However, my code can only extract the Name, title, and the company.

driver = webdriver.Chrome(r'XXX\chromedriver.exe')
driver.get('https://insuretechconnect.com/speakers/')

speakers_info=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.awsm-personal-info'))) speakers_info_fulllist = [] for e in speakers_info: speakers_info_fulllist.append(e.text.split('\n'))

Are there any better ways to extract information for all speakers (either Selenium or the Request)?

Thanks in advance.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352

3 Answers3

1

This solution can do this.

speakers = driver.find_elements_by_xpath("//*[@class='awsm-modal']//div[@class='awsm-grid-card']")
print("Total Speakers : ", len(speakers))

    for i in range(len(speakers)):
        print("Speaker Info # ",i+1)
        details = driver.find_element_by_xpath("//*[@class='awsm-modal']//div[@class='awsm-grid-card']"+"["+str(i+1)+"]")
        print(details.text)
        imgsource = driver.find_element_by_xpath("//*[@class='awsm-modal']//div[@class='awsm-grid-card']"+"["+str(i+1)+"]//img")
        print(imgsource.get_attribute('src'))

The output will be,

Total Speakers : 242 Speaker Info # 1 ERIK ABRAHAMSSON CEO DIGITAL FINEPRINT https://n68y02w29js2mtetnvfd871d-wpengine.netdna-ssl.com/wp-content/uploads/2019/07/Erik-Abrahamsson-1-500x500.jpg

Sureshmani Kalirajan
  • 1,938
  • 2
  • 9
  • 18
  • Thank you for the response. It is cool! BTW, I also want to get the text after the

    "Eran Agrios leads the Go To Market strategy..." How can I do that? Thx!

    – Arthur Morgan Aug 23 '19 at 16:42
1

To extract the speaker Eran Agrios's information, i.e. Title, Company, Image Link and Description using only Selenium you can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    chrome_options = webdriver.ChromeOptions() 
    chrome_options.add_argument("start-maximized")
    driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    
    driver.get("https://insuretechconnect.com/speakers/")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='awsm-personal-info']//h3[contains(., 'Eran')]//preceding::img[1]"))))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='awsm-personal-info']//h3[contains(., 'Eran')]//preceding::img[1]"))).click()
    print("Title is : "+ WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='awsm-modal-content-inner']/h2[contains(., 'Eran')]//following::h3[1]"))).get_attribute("innerHTML"))
    print("Company is : "+ WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='awsm-modal-content-inner']/h2[contains(., 'Eran')]//following::h3[2]/b"))).get_attribute("innerHTML"))
    print("Image Link is : "+ WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='awsm-modal-content-inner']/h2[contains(., 'Eran')]//preceding::img[1]"))).get_attribute("src"))
    print("Description : "+ WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='awsm-modal-content-inner']/h2[contains(., 'Eran')]//following::p[1]"))).get_attribute("innerHTML"))
    driver.quit()
    
  • Console Output:

    Title is : Head of Global Go To Market, Financial Services Cloud and Wealth & Asset Management
    Company is : Salesforce
    Image Link is : https://n68y02w29js2mtetnvfd871d-wpengine.netdna-ssl.com/wp-content/uploads/2019/07/Eran-Agrois-500x500.png
    Description : Eran Agrios leads the Go To Market strategy for Financial Services Cloud at Salesforce. Eran has over 15 years of experience in customer relationship management technology. She has spent the last 10 years at Salesforce working with Financial Services companies on innovation and digital transformation. Most recently, her focus has been on the success of Salesforce’s first industry product, Financial Services Cloud.
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

To get an attribute with selenium, you need the method driver.fine_element_by_css_selector('path').get_attribute('nameoftheattribute')

For the description, I don't know where is located, but probably you'll need another line of code: it really depends from where is it.