1

I am trying to grab the URL of each Laptop that is on sale on the first 3 pages of this Amazon page

URL: https://www.amazon.com/s?i=computers&rh=n%3A565108%2Cp_72%3A1248879011&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590091272&ref=sr_pg_1

Every time I run the script, the driver.findElements(By.xpath) returns an inconsistent amount of URLs. The first page is pretty consistent and it return 4 URLs but page 2 and 3 can return anywhere between 1 and 4 URLs even though page 2 has 8 URLs I am looking for and page 3 has 4 URLs I am looking for.

I doubt the problem is in the grabData method since it grabs the data based on the inconsistent URLs list given. I am pretty new to this so I hope that all made sense. Any help would be appreciated. Let me know if you need more clarification

public static String dealURLsXpath = "//span[@data-a-strike=\"true\" or contains(@class,\"text-strike\")][.//text()]/parent::a[@class]";
public static List<String> URLs = new ArrayList<String>();


public static void main(String[] args)
    {       
        //Initialize Browser
        System.setProperty("webdriver.chrome.driver", "C:\\Users\\email\\eclipse-workspace\\ChromeDriver 81\\chromedriver.exe");
        WebDriver driver = new ChromeDriver();
        driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);


        //Search through laptops and starts at page 1
        Search.searchLaptop(driver);
        //Grabs data for each deal and updates Products List directly
        listingsURL = driver.getCurrentUrl();
        //updates the global URLs List with the URLs found by driver.findElements(By.xpath)
        updateURLsList(driver);
        //Iterates through each URL and grabs laptop information to add to products list
        grabData(driver, URLs, "Laptop");
        // Clears URLs list so that it can be populated by the URLs in the next page
        URLs.clear();
        // returns driver to Amazon page to click on "page 2" button to go to next page and repeat process
        driver.get(listingsURL);

        driver.findElement(By.xpath("//a [contains(@href,'pg_2')]")).click();

        listingsURL = driver.getCurrentUrl();
        updateURLsList(driver);
        grabData(driver, URLs, "Laptop");
        URLs.clear();
        driver.get(listingsURL);

        driver.findElement(By.xpath("//a [contains(@href,'pg_3')]")).click();

        listingsURL = driver.getCurrentUrl();
        updateURLsList(driver);
        grabData(driver, URLs, "Laptop");
        URLs.clear();
        driver.get(listingsURL);
    }

public static void updateURLsList(WebDriver driver)
    {
        //list of deals on amazon page
/////////////////////////////////////////////INCONSISTENT/////////////////////////////////////////////
        List<WebElement> deals = driver.findElements(By.xpath(dealURLsXpath));
//////////////////////////////////////////////////////////////////////////////////////////////////////

        System.out.println("Deals Size: " + deals.size());
        for(WebElement element : deals)
        {
            URLs.add(element.getAttribute("href"));
        }
        System.out.println("URL List size: " + URLs.size());
        deals.clear();
    }
public static void grabData(WebDriver driver, List<String> URLs, String category)
    {
        for(String url : URLs)
        {
            driver.get(url);
            String name = driver.findElement(By.xpath("//span [@id = \"productTitle\"]")).getText();
            System.out.println("Name: " + name);
            String price = driver.findElement(By.xpath("//span [@id = \"priceblock_ourprice\"]")).getText();
            System.out.println("price: " + price);
            String Xprice = driver.findElement(By.xpath("//span [@class = \"priceBlockStrikePriceString a-text-strike\"]")).getText();
            System.out.println("Xprice: " + Xprice);
            String picURL = driver.findElement(By.xpath("//img [@data-old-hires]")).getAttribute("src");
            System.out.println("picURL: " + picURL);

            BufferedImage img;

            System.out.println("URL: " + url);

            try
            {
                img = ImageIO.read(new URL(picURL));
                products.add(new Product(
                        name, 
                        Integer.parseInt(price.replaceAll("[^\\d.]", "").replace(".", "").replace(",", "")), 
                        Integer.parseInt(Xprice.replaceAll("[^\\d.]", "").replace(".", "").replace(",", "")), 
                        img, 
                        category, 
                        url));
            }
            catch(IOException e)
            {
                System.out.println("Error: " + e.getMessage());
            }




        }
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
zaid iqbal
  • 101
  • 1
  • 2
  • 11
  • findElements returns when at least 1 item is found. You can use a standard sleep, or catch StaleElement exceptions and re-run. See this for a method of catching stale element: https://github.com/pcalkins/browsermator/blob/master/src/browsermator/com/StoreLinksAsArrayByXPATHAction.java – pcalkins May 21 '20 at 20:43

2 Answers2

0

You should try using wait in a selenium way:

WebDriverWait wait = new WebDriverWait(driver, 20);
List<WebElement> deals = wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.xpath(dealURLsXpath)));
frianH
  • 7,295
  • 6
  • 20
  • 45
0

To grab the href attribute of each Laptop that is on sale on the first 3 pages of this Amazon page you need to induce WebDriverWait for the visibilityOfAllElementsLocatedBy() and you can use the following Locator Strategy:

  • Code Block:

    driver.get("https://www.amazon.com/s?i=computers&rh=n%3A565108%2Cp_72%3A1248879011&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590091272&ref=sr_pg_1");
    List<WebElement> deals = new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.xpath("//span[@class='a-price a-text-price']//parent::a[1]")));
    for(WebElement deal:deals)
        System.out.println(deal.getAttribute("href"));
    
  • Console Output:

    https://www.amazon.com/Apple-MacBook-13-inch-256GB-Storage/dp/B08636NKF8/ref=sr_1_2?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-2
    https://www.amazon.com/Apple-MacBook-16-Inch-512GB-Storage/dp/B081FZV45H/ref=sr_1_5?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-5
    https://www.amazon.com/Apple-MacBook-13-inch-128GB-Storage/dp/B07V49KGVQ/ref=sr_1_9?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-9
    https://www.amazon.com/New-Microsoft-Surface-Pro-Touch-Screen/dp/B07YNHXX8D/ref=sr_1_23?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-23
    

Similarly, Page 2 gives 4 and Page 3 gives 4 urls respectively.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352