2

I am trying to scrape a webpage with Puppeteer. Enter, navigate through some pages and in the data pages (those that are paginated) add POST data (emulating the form).

The event to intercept the request can only be created once, so all calls will be affected by the data sent via POST. (Node Puppeteer, page.on( "request" ) throw a "Request is already handled!")

I didn't find much information on this (how do POST request in puppeteer?), and finally did the following:

  • Create a function that will always be called (on each request).
  • Query an attribute of the function to see if it has an object.
    • If you have it, embed the data via POST; and remove the attribute.
    • If the attribute does not exist, continue without embedding data.
const openConnection = async () => {
  const browser = await puppeteer.launch({
    headless: true,
    args: ["--no-sandbox"],
  });
  const page = await browser.newPage();
  await page.setRequestInterception(true);
  page.on("request", requestPost);
  return { browser, page };
};
const requestPost = async (req) => {
  if (typeof requestPost.data === "object") {
    requestPost.data.headers = { ...req.headers(), ...requestPost.data.headers };
    await req.continue(requestPost.data);
    delete requestPost.data;
  } else {
    await req.continue();
  }
};
const getData = async (m, y, p, l) => {
  const { browser, page } = await openConnection();
  let data = [];
  let pagina = p;
  do {
    /* JUST because this attribute is being created, the next request that is created in the page.goto() that follows, will be altered with these attributes */
    requestPost.data = {
      method: "POST",
      postData: `&pagina=${pagina}&mes=${m}&year=${y}`,
      headers: { "Content-Type": "application/x-www-form-urlencoded" },
    };
    await page.goto("https://url.com/info.cgi", { waitUntil: "networkidle2" });
    // Now I work the data and add it to the end
    // data = data.push();
    pagina++;
  } while (pagina < p + l);
  await closeConnection(page, browser);
  return data;
};
karel
  • 5,489
  • 46
  • 45
  • 50

0 Answers0