0

Learning how to scrape websites with Puppeteer, it occurred to me, as javascript newbie, that the simplest way to work with certain dynamically generated sites would be to create a loop to cycle through some Div's one by one, and to trigger actions based on their specific attributes (e.g., click all Div's in loop with `class=Clickable)

Most of the Puppeteer examples online go through how to select page elements by certain names/classes, like in here, but not how to create such a loop that goes through a subportion of Div's, and perform actions based on them.

Since I am new to Javascript, if someone would give me advice or pointers on how to do this, I would be extremely grateful.

Example: I want to scrape data from the bottom most layers of certain Div's, in this case those with section= some variant of the letter C, so I open the following slice of Div's in the following way, with the plan to extract some data at the end,

<div class="Table" section="A">
<div class="Columns" id style="display: none;">
<div class="Table" section="B">
<div class="Columns" id style="display: none;">

## section "C" has been clicked by Puppeteer, 
## and so indented part is what is expanded from class=Columns
<div class="Table" section="C">
<div class="Columns" id style>

    ## Next sub-section that needs to be gone through
    <div class="Column" section="a">
    <div class="Rows" id style="display: none;">
    <div class="Column" section="b">
    <div class="Rows" id style="display: none;">

    ## Repeating same process one layer deeper now
    <div class="Column" section="c">
    <div class="Rows" id style>

        <div class="Subsection" section="i">
        <div class="data" id style>
        ............
        ............
        *loop keeps going*

I hope my use case is clearer now.

Coolio2654
  • 1,589
  • 3
  • 21
  • 46

1 Answers1

2

Do I understand correctly? You have some nested layers of hidden divs; to reveal next layer, you need to click on some visible element; then a hidden element becomes visible and you need to click on it too to go on and so on?

If so, there can be two cases.

  1. If expanding is always synchronous. Then all traversing can be done in the browser context:
await page.evaluate(() => {
  const sections = ['C', 'c', 'i'];

  for (let section of sections) {
    const element = document.querySelector(`[section=${section}]`);
    element.click();
  }
});
  1. If expanding may be asynchronous (for example, fepends on XHR/fetch call). Then we need async checks in the Node.js context:
const sections = ['C', 'c', 'i'];
for (let section of sections) {
  const element = await page.waitForSelector(`[section=${section}]`, { visible: true });
  await element.click();
}
vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
  • You understood my question perfectly. But what do you mean by synchronous or asynchronous expanding? The expansion of the following `Div` is done immediately upon clicking the previous `Div` that matched the right section name. – Coolio2654 Feb 22 '19 at 19:28
  • By synchronous, I mean that the next element is ready for clicking immediately, not after some asynchronous net call. So with the described mechanism, the first way may suffice. – vsemozhebuty Feb 22 '19 at 20:15