1

I wish to iterate over a table (a calendar table) in Puppeteer and click specific cells (dates) to toggle their status (to "AWAY").

I've included a snippet of the table below. Each td cell contains two child divs, one with the day number (<div class="day_num">) and another if it has been marked as "AWAY" (<div class="day_content">).

So far I've been able to scrape the table but that won't allow me to click the actual cells, as scraping just scrapes the table contents into an array.

How can I iterate over all the cells and click specific ones depending on the day number included in the child "day_num" div? For example, I wish to click the td for day 8 in the example below, to toggle it's status.

<table class="calendar">
<tr class="days">

<td class="day">
<div class="day_num">7</div>
<div class="day_content"></div>
</td>
<td class="day">
<div class="day_num">8</div>
<div class="day_content"></div>
</td>
<td class="day">
<div class="day_num">9</div>
<div class="day_content">AWAY</div>
</td>

The scraping code I currently have is:

 const result = await page.evaluate(() => {
    const rows = document.querySelectorAll('.calendar tr td div');
    return Array.from(rows, (row) => {
      const columns = row.querySelectorAll('div');
      return Array.from(columns, (column) => column.innerHTML);
    });
  });

  console.log(result);

result is:

[
  [],           [ '1', '' ],  [ '2', 'AWAY' ],
  [ '3', '' ],  [ '4', '' ],  [ '5', '' ],
  [ '6', '' ],  [ '7', '' ],  [ '8', '' ],
  [ '9', 'AWAY' ],  [ '10', '' ], [ '11', '' ],
  [ '12', '' ], [ '13', '' ], [ '14', '' ],
  [ '15', '' ], [ '16', '' ], [ '17', '' ],
  [ '18', '' ], [ '19', '' ], [ '20', '' ],
  [ '21', '' ], [ '22', '' ], [ '23', '' ],
  [ '24', '' ], [ '25', '' ], [ '26', '' ],
  [ '27', '' ], [ '28', '' ], [ '29', '' ],
  [ '30', '' ], [],           [],
  [],           []
]
Stephen Kempin
  • 113
  • 2
  • 16
  • What have you tried? Did `page.click(".day_content")` work? When clicking fails, there's no substitute for sharing a [mcve] of the actual page, because there are myriad reasons why an element might not be clickable that can't be communicated with a static HTML snippet like this. – ggorlen Feb 02 '22 at 16:06
  • I've edited my post with examples, can you answer in context of the update? Thanks – Stephen Kempin Feb 02 '22 at 16:33
  • Thanks for the additional info. I took a shot at it but I'm still making many assumptions and can't test on the live site, so if it doesn't work you'll need to provide more details (preferably the live site, or a minimal representation of its behavior with relevant JS). – ggorlen Feb 02 '22 at 17:04

1 Answers1

2

While you haven't provided the live page (so I can't verify that arbitrary JS, visibility and timing won't make this fail), I'll take a stab at it and see if the following works, assuming your HTML is pretty much static:

const puppeteer = require("puppeteer"); // ^16.2.0

let browser;
(async () => {
  const html = `
    <body>
    <table class="calendar">
      <tr class="days">
        <td class="day">
          <div class="day_num">7</div>
          <div class="day_content"></div>
        </td>
        <td class="day">
          <div class="day_num">8</div>
          <div class="day_content"></div>
        </td>
        <td class="day">
          <div class="day_num">9</div>
          <div class="day_content">AWAY</div>
        </td>
      </tr>
    </table>
    <script>
      [...document.querySelectorAll(".day_content")][1]
        .addEventListener("click", e => {
          e.target.textContent = "CLICKED";
        })
      ;
    </script>
    </body>
  `;
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  await page.setContent(html);
  const xp = '//div[contains(@class, "day_num") and text()="8"]';
  const [dayEl] = await page.$x(xp);
  const dayContent = await dayEl.evaluate(el => {
    const dayContent = el.closest(".day").querySelector(".day_content");
    dayContent.click();
    return dayContent.textContent;
  });
  console.log(dayContent); // => CLICKED
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

The approach is to find the .day_num element you're interested in using an XPath on the class and text, then pop up the tree to the .day element and down again to the associated .day_content element to click it. I added a listener to change the text upon click to verify that it was indeed clicked.

You could also use nextElementSibling on the .day_num rather than the closest/querySelector combo, but this assumes more about the relationship between the .day_num and .day_content elements and would probably be more brittle.

Also, if the text content "8" might have whitespace, you can loosen it up a bit with substring contains in your XPath. '//div[contains(@class, "day_num") and contains(text(), "8")]', at the risk of false positives and selecting, say, "18" and "28". In that case, a regex or tree walk and trim might be more appropriate. It's hard to make a recommendation based on this excerpt of the HTML out of context.


Taking a step further, it sounds like you need to click multiple elements in a loop and are struggling to do that. Here's an attempt that works on a mocked-up version of the site:

const puppeteer = require("puppeteer"); // ^16.2.0

let browser;
(async () => {
  const html = `
    <body>
    <table class="calendar">
      <tr class="days"></tr>
    </table>
    <script>
      for (let i = 0; i < 30; i++) {
        document.querySelector(".days").innerHTML += 
          \`<td class="day">
            <div class="day_num">\${i + 1}</div>
            <div class="day_content"></div>
          </td>\`
        ;
      }

      [...document.querySelectorAll(".day_content")].forEach(e =>
        e.addEventListener("click", e => {
          e.target.textContent = "AWAY";
        })
      );
    </script>
    </body>
  `;
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  await page.setContent(html);
  const awayDatesInMonth = [5, 12, 18, 20];

  for (const day of awayDatesInMonth) {
    const xp = `//div[contains(@class, "day_num") and text()="${day}"]`;
    const [dayEl] = await page.$x(xp);
    const dayContent = await dayEl.evaluate(el =>
      el.closest(".day").querySelector(".day_content").click()
    );
  }

  /* or if you can assume the elements are correctly indexed */
  const days = await page.$$(".day_content");

  for (const day of awayDatesInMonth) {
    await days[day-1].evaluate(el => el.click());
  }
  /* --- */

  console.log(await page.content());
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

If this doesn't work, please provide your own mock-up that better represents the original site you're working with so I can be sure I'm solving the relevant problem.

Note that I'm using native DOM clicks which are untrusted and work differently than the trusted Puppeteer page.click()/ElementHandle.click() methods. If the native DOM click doesn't trigger a response, try the Puppeteer click.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • Thanks that worked, much appreciated! As for the calendar I need to iterate through all the days in each month, so I assume I'd need to query all the "day_num" divs and then loop through these with the xpath code above? Is there a clean way to do it? – Stephen Kempin Feb 02 '22 at 17:53
  • If you're iterating all days in the month, then simply selecting all `.day_num` elements should work. Or, if you're iterating all days and taking an action on each `.day_content` based on the value of `.day_num`, I'd select all `.day`s and then query their child `.day_num` and `.day_content` to read the values and click as necessary. – ggorlen Feb 02 '22 at 17:58
  • Brilliant. Would that be with `queryAllSelector`? Would you mind providing an example of that query if possible? Thanks so much for your time, it's been a huge help! – Stephen Kempin Feb 02 '22 at 18:20
  • No problem, but I feel like your question is underspecified and i'm having to solve the problem by making guesses with small fragments of information, without context. Can you share exactly what you're trying to achieve at a high-level, please? If you can [edit] your original post to clarify matters, that'd be great. But if you're radically changing the question after the answer was posted, I suggest opening a new one. If you can explain how you're deciding which cells you want to click, for example, I can probably offer concrete code. But right now I have no idea what the goal is. – ggorlen Feb 02 '22 at 18:35
  • Sorry yes, to clarify I want to iterate through every day in a given calendar month. So I'd like to check the content of each `. day_content` div and toggle it depending on if it is marked as away or not (comparing it against a separate list of dates I'll provide in the function). So I need to check each day basically. – Stephen Kempin Feb 02 '22 at 18:38
  • OK, so what/where is that list of dates and which ones should or shouldn't be toggled? If you can provide this date list, the rationale/logic for toggling, the "before" calendar and the "after" calendar, and ideally the live site with the full calendar or a mock-up of its relevant behavior I can use with `page.setContent`, that'd be great. Short of that, it's up to you to take the partial code here and extrapolate it to whatever your use case is. – ggorlen Feb 02 '22 at 18:40
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/241663/discussion-between-stephen-kempin-and-ggorlen). – Stephen Kempin Feb 02 '22 at 19:33
  • Unfortunately I can't provide a link as the calendar is behind authentication. I'm able to provide a list of numbered days for any given month that I wish to mark as away. I've added an example below. but this doesn't function. ``` const awayDatesInMonth = [5, 12, 18, 20]; const markAsAway = async (day) => { const [dayEl] = await page.$x(`//div[contains(@class, "day_num") and text()="${day}"]`); const dayContent = await dayEl.evaluate((el) => { const dayContent = el.closest('.day').queryAllSelector('.day_content'); ``` – Stephen Kempin Feb 02 '22 at 22:22
  • So you want to click on all 4 `.day_content` elements corresponding to the days in `awayDatesInMonth`? Should I assume they're not already marked as `AWAY` or just click on them no matter what? Thanks for the code, this helps clarify what you're trying to do, although it's best to put that in the question as an [edit]. – ggorlen Feb 02 '22 at 22:29
  • Yes that is all correct. Apologies, I should have created the original question in context. And that’s all correct, I won’t know the status of the days in the cell ahead of time, so need to conditionally click them. – Stephen Kempin Feb 03 '22 at 09:57