0

I am using Selenium with Python but anytime I run my Python script I get redundant data.

for index in range(1, 20):
    try:
        business_el = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="pane"]/div/div[1]/div/div/div[5]/div[1]/div[1]'.format(index))))  
        business_el.click()
        time.sleep(5)

        tree = html.fromstring(driver.page_source)
        title = get_data(tree, '//div[@role="main"]//h1[contains(@class, "section-hero-header-title-title ")]/span/text()')
        phone_number = get_data(tree, '//button[@data-tooltip="Copy phone number"]/div/div[@aria-hidden="false"]/div/text()')
        website_url = get_data(tree, '//button[@data-tooltip="Open website"]/div/div[@aria-hidden="false"]/div/text()')
        address = get_data(tree, '//button[@data-item-id="address"]/div/div[@aria-hidden="false"]/div/text()')
        ratings = get_data(tree, '//span[@class="section-star-display"]/text()')
        reviewsCount = get_data(tree, '//span[@class="section-rating-term"]//button[contains(@aria-label, " reviews")]/text()')
        description = get_data(tree, '//div[@class="section-editorial-quote"]/span/text()')
        try:
            email = parse_email(website_url)
        except Exception as e:
            email = ''

        print(title, phone_number, website_url, email, address, ratings, reviewsCount, description)
        writer.writerow([
            title, 
            phone_number, 
            website_url, 
            email,
            address, 
            ratings, 
            reviewsCount, 
            description, 
        ])
Arundeep Chohan
  • 9,779
  • 5
  • 15
  • 32
deje
  • 9
  • 2

1 Answers1

0

driver.page_source is likely not returning all the information you think it is. It's a "Best attempt" at gathering the page data, it's not necessarily the full DOM. Essentially it's only giving what the initial page loads, without all the dynamic content loaded

See also: https://stackoverflow.com/a/65567070/1387701

DMart
  • 2,401
  • 1
  • 14
  • 19