3

I'm trying to grab products from ebay and open them on amazon.

So far, I have them being searched on amazon but I'm struggling with getting the products selected from the search results.

Currently its outputting a blank array and im not sure why. Have tested in a separate script without the grabTitles and the for loop. So im guessing there is something in that causing an issue.

Is there something i am missing here thats preventing the data coming back for prodResults?

const puppeteer = require('puppeteer');

const URL = "https://www.amazon.co.uk/";
const selectors = {
  searchBox: '#twotabsearchtextbox',
  productLinks: 'span.a-size-base-plus.a-color-base.a-text-normal',
  productTitle: '#productTitle'
};

(async() => {
  const browser = await puppeteer.launch({
    headless: false
  });
  const page = await browser.newPage();
  await page.goto('https://www.ebay.co.uk/sch/jmp_supplies/m.html?_trkparms=folent%3Ajmp_supplies%7Cfolenttp%3A1&rt=nc&_trksid=p2046732.m1684');

  //Get product titles from ebay
  const grabTitles = await page.evaluate(() => {
    const itemTitles = document.querySelectorAll('#e1-11 > #ResultSetItems > #ListViewInner > li > .lvtitle > .vip');
    var items = []
    itemTitles.forEach((tag) => {
      items.push(tag.innerText)
    })
    return items
  })

  //Search for the products on amazon in a new tab for each product 
  for (i = 0; i < grabTitles.length; i++) {

    const page = await browser.newPage();

    await page.goto(URL)
    await page.type(selectors.searchBox, grabTitles[i++])
    await page.keyboard.press('Enter');

    //get product titles from amazon search results
    const prodResults = await page.evaluate(() => {
      const prodTitles = document.querySelectorAll('span.a-size-medium.a-color-base.a-text-normal');
      let results = []
      prodTitles.forEach((tag) => {
        results.push(tag.innerText)
      })
      return results
    })
    console.log(prodResults)
  }
})()
ggorlen
  • 44,755
  • 7
  • 76
  • 106
user303749
  • 95
  • 8
  • `await page.keyboard.press('Enter');` probably triggers a navigation or DOM change but you never [wait for it](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagewaitfornavigationoptions). Use `waitForNavigation`, `waitForSelector` or `waitForFunction` to tell Puppeteer not to proceed until the condition you expect is ready. – ggorlen Apr 01 '22 at 21:44
  • Sorry im still learning puppeteer, i have tried this method but with no luck so far, do i need to put the await page.keyboard.press('Enter'); in a function and call it in the waitForFunction? thanks – user303749 Apr 01 '22 at 22:56
  • I'm working on an answer which I'll post momentarily. – ggorlen Apr 01 '22 at 22:57

3 Answers3

3

There are a few potential problems with the script:

  1. await page.keyboard.press('Enter'); triggers a navigation, but your code never waits for the navigation to finish before trying to select the result elements. Use waitForNavigation, waitForSelector or waitForFunction (not waitForTimeout).

    If you do wait for a navigation, there's a special pattern using Promise.all needed to avoid a race condition, shown here.

    Furthermore, you might be able to skip a page load by going directly to the search URL by building the string yourself. This should provide a significant speedup.

  2. Your code spawns a new page for every item that needs to be processed, but these pages are never closed. I see grabTitles.length as 60. So you'll be opening 60 tabs. That's a lot of resources being wasted. On my machine, it'd probably hang everything. I'd suggest making one page and navigating it repeatedly, or close each page when you're done. If you want parallelism, consider a task queue or run a few pages simultaneously.

  3. grabTitles[i++] -- why increment i here? It's already incremented by the loop, so this appears to skip elements, unless your selectors have duplicates or you have some other reason to do this.

  4. span.a-size-medium doesn't work for me, which could be locality-specific. I see a span.a-size-base-plus.a-color-base.a-text-normal, but you may need to tweak this to taste.

Here's a minimal example. I'll just do the first 2 items from the eBay array since that's coming through fine.

const puppeteer = require("puppeteer"); // ^13.5.1

let browser;
(async () => {
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
  await page.setExtraHTTPHeaders({"Accept-Language": "en-US,en;q=0.9"});
  await page.setUserAgent(ua);
  const titles = [
    "Chloraethyl | Dr. Henning | Spray 175 ml",
    "Elmex Decays Prevention Toothpaste 2 x 75ml",
  ];

  for (const title of titles) {
    await page.goto("https://www.amazon.co.uk/");
    await page.type("#twotabsearchtextbox", title);
    await Promise.all([
      page.keyboard.press("Enter"),
      page.waitForNavigation(),
    ]);
    const titleSel = "a span.a-size-base-plus.a-color-base.a-text-normal";
    await page.waitForSelector(titleSel);
    const results = await page.$$eval(titleSel, els =>
      els.map(el => el.textContent)
    );
    console.log(title, results.slice(0, 5));
  }
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Output:

Chloraethyl | Dr. Henning | Spray 175 ml [
  'Chloraethyl | Dr. Henning | Spray 175 ml',
  'Wild Fire (Shetland)',
  'A Dark Sin: A chilling British detective crime thriller (The Hidden Norfolk Murder Mystery Series Book 8)',
  'A POLICE DOCTOR INVESTIGATES: the Sussex murder mysteries (books 1-3)',
  'Rites of Spring: Sunday Times Crime Book of the Month (Seasons Quartet)'
]
Elmex Decays Prevention Toothpaste 2 x 75ml [
  'Janina Ultra White Whitening Toothpaste (75ml) – Diamond Formula. Extra Strength. Clinically Proven. Low Abrasion. For Everyday Use. Excellent for Stain Removal',
  'Elmex Decays Prevention Toothpaste 2 x 75ml',
  'Elmex Decays Prevention Toothpaste 2 x 75ml by Elmex',
  'Elmex Junior Toothpaste 2 x 75ml',
  'Elmex Sensitive Professional 2 x 75ml'
]

Note that I added user agents and headers to be able to use headless: true but it's incidental to the main solution above. You can return to headless: false or check out canonical threads like How to avoid being detected as bot on Puppeteer and Phantomjs? and Why does headless need to be false for Puppeteer to work? if you have further issues with detection.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • 1
    Good tips about navigating directly to a url instead of using the search box, and not opening multiple tabs. – Neil Apr 02 '22 at 13:10
  • 1
    Thanks a lot for this comment , great advice. I understand the multiple tabs are not ideal but it is the desired outcome in order to view it for further research, ideally opening the product if the ebay title is in the amazon search results , so i'll look into parallelism – user303749 Apr 02 '22 at 13:51
  • @ggorlen would it be possible to click the titleSel element if its in the grabResults array, i have tried the following but with no luck if (grabTitles.includes(titleSel)) { await page.click(titleSel); } – user303749 Apr 06 '22 at 21:01
  • `grabTitles.includes(titleSel)` doesn't seem to make sense to me. `grabTitles` is an array of titles while `titleSel` is a CSS selector. I'm not sure what you're trying to accomplish here, so I suggest opening a new question with a clear problem statement, expected output and code. – ggorlen Apr 06 '22 at 21:08
1

You've hit on an age old problem with Puppeteer and knowing when a page has fully completed rendering or loading.

You could try adding the following:

await page.waitForNavigation({ waitUntil: 'networkidle2' })
await page.waitForTimeout(10000)

Usually I find networkidle2 isn't always reliable enough so I add an arbitrary extra waitForTimeout. You'll need to play around with the timeout value (10000 = 10 seconds) to get what you're looking for, not ideal I know but I've not found a better way.

Neil
  • 7,861
  • 4
  • 53
  • 74
  • 1
    It may appear to work, but it's a poor solution that causes a race condition. At best, your code will run slowly blocking unnecessarily, and at worst you'll miss data randomly. The [docs](https://github.com/puppeteer/puppeteer) say: "Puppeteer has event-driven architecture, which removes a lot of potential flakiness. There’s no need for evil “sleep(1000)” calls in puppeteer scripts." (`waitForTimeout` is exactly such a call). – ggorlen Apr 01 '22 at 22:56
  • 1
    I agree that using waitForTimeout is a hack and not an ideal solution to the problem. – Neil Apr 02 '22 at 13:12
0

For your purpose you can use an NPM package ecommerce-scraper-js. It allows you to do what you need with much fewer lines of code and no need to maintain your parser:

import { amazon, ebay } from "ecommerce-scraper-js";

(async () => {
  // get 5 results for "playstation 5" from ebay
  const ebayProducts = await ebay.getListings("playstation 5", 5);

  // iterate over received products
  for (const product of ebayProducts) {
    // destructure product and get title
    const { title } = product;
    // get 5 results for received title from amazon
    const amazonProducts = await amazon.getListings(title, 5);
    // get titles from received products
    const amazonTitles = amazonProducts.map((el) => el.title);

    console.log(title, amazonTitles);
  }
})();

Output example:

Sony PS5 Console[
   "PlayStation 5 Console (PS5)",
   "PlayStation 5 Console CFI-1102A",
   "Playstation DualSense Wireless Controller",
   "Playstation DualSense Charging Station",
   "Sony Playstation Portable PSP 3000 Series Handheld Gaming Console System (Pink) (Renewed)"
]
☑️ NEW Sony Playstation PS 5 Digital Edition Console System (SHIPS THE NEXT DAY)[
   "PlayStation 5 Console (PS5)",
   "Playstation DualSense Wireless Controller",
   "PlayStation 5 Console CFI-1102A",
   "DualShock 4 Wireless Controller for PlayStation 4 - Glacier White, Case",
   "The Last of Us Part I – PlayStation 5"
]
...and other results

You can see more use cases (with examples) in the documentation.

Disclaimer: I'm author of this package

Mikhail Zub
  • 454
  • 3
  • 9