1

I'm trying to capture information within a website but I'm having problems with the iframes.

I need to access the site https://www.saisp.br/online/produtos-publicos/, click on the left menu Paraíba do Sul Telemetric Network and capture the value of the line Rio Paraitinga - Passarela / São Luiz do Paraitinga.

My code below doesn't capture the information from the second iframe. How can I do this?

const puppeteer = require('puppeteer');

(async () => {
  // Launch a new browser instance
  const browser = await puppeteer.launch();

  // Create a new page instance
  const page = await browser.newPage();

  // Navigate to the specified webpage
  await page.goto('https://www.saisp.br/online/produtos-publicos/');

  await page.waitForTimeout(4000)

  const link = await page.$x('/html/body/app-home/div/nav/ul/li[12]/div/div[1]/a');
  await link[0].click();
  await page.waitForTimeout(2000)

  // Espera o iframe carregar e troca o contexto para ele
  // Selecione o iframe usando XPath
  const iframe = (await page.$x('/html/body/app-home/div/main/iframe'))[0];
  const frameContent = await iframe.contentFrame(); 
  await page.waitForTimeout(1000)
  await frameContent.waitForSelector('#tbTelemBody');

  const elemento = (await frameContent.$x('/html/body/div/table/tbody/tr/td[2]'))[0];
  const value = await frameContent.evaluate(el => el.textContent, elemento);

  console.log(value);

  await browser.close();
})();
ggorlen
  • 44,755
  • 7
  • 76
  • 106
dihslp
  • 13
  • 3

1 Answers1

1

This is actually a pretty complex page and your attempt is a good one.

As a general point, avoid timeouts. Stick to event-driven code for speed and accuracy. Prefer waitForSelector, waitForXPath and waitForFunction.

Avoid browser-generated paths, and prefer CSS selectors to XPaths when possible.

The website doesn't have much in the way of identifying attributes on its elements, so we should prefer text for selecting elements here. See this thread for text selection techniques in Puppeteer.

Here's one approach:

const puppeteer = require("puppeteer"); // ^19.7.2

const url = "<Your URL>";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const el = await page.waitForSelector("text/ba do Sul");
  await el.evaluate(el => el.click());
  const row = await page.waitForFunction(() =>
    [
      ...document
        .querySelector("iframe")
        .contentDocument.querySelectorAll("#tbDadosTelem tr"),
    ].find(e => e.textContent.includes("Luiz do Para"))
  );
  const rowData = await row.$$eval("td", els =>
    els
      .filter(e => !e.querySelector("td") && e.textContent.trim())
      .map(e => e.textContent.trim().replace(/\s+/g, " "))
  );
  console.log(rowData);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Output:

[
  'Rio Paraitinga - Passarela / São Luiz do Paraitinga',
  '0.6',
  '3.29'
]
ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • Hi ggorlen, Indeed, the page is difficult to be traced and I tried in several ways but without success. I understood your explanation and the code worked, now I know how to use it and I can refine it, thank you very much for your help. – dihslp Mar 05 '23 at 11:07
  • Dear, good afternoon, Sorry to bother you again but today I've been tinkering with it all day and using the way you mentioned but I couldn't, if I need to get only the value of the last column of the respective line, how would I do it? It is a flood warning system for people who live close to a river, I would like to validate if the information is greater than 2.00 and send the message, otherwise it would not send. as for sending it is already ok, I just need the value in the variable to manipulate it. thanks – dihslp Mar 05 '23 at 20:13
  • Are you looking for `rowData.at(-1)`? If you want to make it a number, you can use `+rowData.at(-1)`. – ggorlen Mar 05 '23 at 20:32
  • No, actually I can't separate the line items, that is, get only the value of the last column of the searched line – dihslp Mar 05 '23 at 21:47
  • I don't think I follow. `rowData` is a row, so the last element is the value of the last column of the searched line. – ggorlen Mar 06 '23 at 04:16