0

I want to extract some dates from Dell's website in my interest for my devices. I tried to download the webpages using urllib but it's protected by captcha and I can't bypass that for now. Now I am using Selenium to open a browser, solve manually the capthca and then automatically opening the pages and extracting the dates. The problem is that the css selector is returning some weird elements instead of the desired output

My code:

from selenium import webdriver
import time
driver = webdriver.Chrome()


def scrape(codes):
    dates = []
    for i in range(len(codes)):
        driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
                   "servicetag/%s/warranty?ref=captchasuccess" % codes[i])

    # Solve captcha manually
        if i == 0:
            print("You now have 120\" seconds to solve the captcha")
            time.sleep(120)
            print("120\" Passed")
    # Extract data
        expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
        print(expdate)
    driver.close()

codes = ['1FMR762', '15FDBG2', '10V8YZ1']
scrape(codes)

Expected output:

June 22, 2018
October 15, 2017
April 19, 2017

Given output:

<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.21873872382745052-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.06836824093097027-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.6642161898702734-1")>
Julanu
  • 162
  • 2
  • 5
  • 15

1 Answers1

1

Looking at the API documentation, the find_element_by_css_selector function returns a WebElement object. See https://selenium-python.readthedocs.io/api.html.

The web elements content needs to be converted into a string before printing as explained in Python and how to get text from Selenium element WebElement object?.

So it should help to change your line print (expdate) to print (expdate.text).

Heiko Becker
  • 556
  • 3
  • 16
  • I changed the line to print(expdate.get_attribute('innerText')) because the text is hidden – Julanu Sep 20 '18 at 08:48
  • https://www.dell.com/support/home/yu/en/yubsdt1/product-support/servicetag/15fdbg2/warranty The problem is here when I have to extract the date from the line that contains "Onsite Service After Remote Diagnosis", is there a way to check that? – Julanu Sep 20 '18 at 08:50
  • What does your current program output? Is it already at the right table column? – Heiko Becker Sep 20 '18 at 08:53
  • For the link that I have posted it outputs "October 15, 2017" but it should output "October 15, 2019", it seems to be on the right column but not on the right row – Julanu Sep 20 '18 at 08:55
  • Have you tried changing your selector to `... > table > tbody > tr:nth-child(2) > ...`? An explanation can be found at https://stackoverflow.com/questions/4494708/using-css-selectors-to-access-specific-table-rows-with-selenium#4494743 – Heiko Becker Sep 20 '18 at 09:05
  • I will, the thing is sometimes 1 row with information may appear, and sometimes there could be 2 rows, so that is why I want to identify the row by text and then to select the td with the date – Julanu Sep 20 '18 at 09:10
  • Can you please turn accept my answer and post this problem as a separate question, as it is not related to printing the content but rather how to find the correct element? – Heiko Becker Sep 20 '18 at 09:15
  • Haha, yes, sorry dude :) – Julanu Sep 20 '18 at 09:17