0

I have the following sample HTML:

<div class="person">
    <div class="title">
        <a href="http://www.url.com/name/">John Smith</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/name/">Phil Collins</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/name/">Tracy Beaker</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>

I am trying to iterate through the list to try and get the following results:

John Smith, SalesForce
Phil Collins, TaskForce
Trace Beaker, Accounting

I am using the following code:

persons = []
for person in driver.find_elements_by_class_name('person'):
    title = person.find_element_by_xpath('.//div[@class="title"]/a').text
    company = person.find_element_by_xpath('.//div[@class="company"]/a').text

    persons.append({'title': title, 'company': company})

However, the above code only iterates through the first person and not through all the people. Any help is appreciated.

Aslan
  • 49
  • 6

3 Answers3

3

As you are able to iterate through the first person details that implies your logic is perfect but to consider all the persons you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASSNAME, "person")))
    title = person.find_element_by_xpath('.//div[@class="title"]/a').text
    company = person.find_element_by_xpath('.//div[@class="company"]/a').text
    persons.append({'title': title, 'company': company})

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

The below bs4 example shows that all the .person classes are iterating smoothly. But element selection for selenium, you are using element_by_xpath locator strategy whis is depricated. I think , it would be more robust way to use WebDriverWait .

from bs4 import BeautifulSoup

html='''
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup= BeautifulSoup(html,'lxml')

for person in soup.select('.person'):
    title = person.select_one('.title a').text
    print(title)

Output:

John Smith
Phil Collins
Tracy Beaker

Example for selenium:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@class="person"]'))):
    title = person.find_element(By.XPATH,'.//div[@class="title"]/a').text
    company = person.find_element(By.XPATH,'.//div[@class="company"]/a').text

    persons.append({'title': title, 'company': company})
print(persons)


#imports

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32
0

One of the corect ways to do it in Selenium would be:

person_divs = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "person")))
for x in person_divs:
    name = x.find_element(By.CLASS_NAME, "title")
    department = x.find_element(By.CLASS_NAME, "company")
    print(name.text + ',', department.text)

Do not forget to import

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Another way using BeautifulSoup would be:

from bs4 import BeautifulSoup

html = '''
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for x in soup.select('div.person'):
    p_name = x.select_one('div.title').text.strip()
    p_company = x.select_one('div.company').text.strip()
    print(p_name +  ',', p_company)

This would print out:

John Smith, SalesForce
Phil Collins, TaskForce
Tracy Beaker, Accounting

BeautifulSoup (bs4) actually has a great, easy to understand documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30