2

I have multiple cases of table rows from which I want to extract data:

Case 1

 Onsite Service After Remote Diagnosis  April 19, 2014  April 19, 2017

Case 2

CAR                                     October 15, 2016    October 15, 2017    
Onsite Service After Remote Diagnosis   October 15, 2016    October 15, 2019

Case 3

NBD ProSupport                          July 16, 2008   July 15, 2011   
Onsite Service After Remote Diagnosis   July 16, 2008   July 15, 2011

The information that I need to be extracted is on the rows that contain "Onsite Service After Remote Diagnosis" on the second td, which will be for every case the date on the right of the row

Expected output:

                      April 19, 2017
                    October 15, 2017
                       July 15, 2011

My code:

from selenium import webdriver
import time
from openpyxl import load_workbook

driver = webdriver.Chrome()


def scrape(codes):
    dates = []
    for i in range(len(codes)):
        driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
                   "servicetag/%s/warranty?ref=captchasuccess" % codes[i])

    # Solve captcha manually
        if i == 0:
            print("You now have 120\" seconds to solve the captcha")
            time.sleep(120)
            print("120\" Passed")
    # Extract data
        expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
        print(expdate.get_attribute('innerText'))
    driver.close()

codes = ['159DT3J', '15FDBG2', '10V8YZ1']
scrape(codes)

My output:

April 19, 2014
October 15, 2016
July 16, 2008

Taken from the first row that appears and the first td I've tried changing tbody > tr > td:nth-child(3)but identifying based on the text would be better and avoid errors.

Julanu
  • 162
  • 2
  • 5
  • 15

1 Answers1

1

Since you need to extract text for "Onsite Service After Remote Diagnosis", I would suggest you update the line you are using for finding the element with the following:

expdate = driver.find_element_by_xpath("//td[text()='Onsite Service After Remote Diagnosis']/following-sibling::td")

Here, we are using xpath locator and looking for td alongside text 'Onsite Service After Remote Diagnosis'

BountyHunter
  • 1,413
  • 21
  • 35
  • Is there any way to go to the next td? as there are two of them and the seconds is the one that I am looking for – Julanu Sep 20 '18 at 11:03
  • Yes, just add an index to the xpath `driver.find_element_by_xpath("(//td[text()='Onsite Service After Remote Diagnosis']/following-sibling::td)[2]")` – BountyHunter Sep 20 '18 at 11:05
  • I added the index like this "//td[text()='Onsite Service After Remote Diagnosis']/following-sibling::td[2]") and it works :) – Julanu Sep 20 '18 at 12:19
  • Done. Quick question...could a regex be used for the text? Would that work? – Julanu Sep 20 '18 at 12:27
  • 1
    You may read this thread for details: https://stackoverflow.com/questions/21405267/xpath-using-regex-in-contains-function – BountyHunter Sep 20 '18 at 15:59