0

I'm abling to fetch some data but not all with the function describe in similar question and copied to this question.

The function does exactly what it is meant to do. But my problem is with this method the desired request not appear on to screen.

The desired request should respond with products JSON information. and it didn't. when I surf to the same URL without coding I can see that request with full responed and I can see in UI all products arrived from this request. But when doing it in this method and (```headless: false'') the UI shows all excluding the products data. What am I missing here?

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    const results = []; // collects all results

    let paused = false;
    let pausedRequests = [];

    const nextRequest = () => { // continue the next request or "unpause"
        if (pausedRequests.length === 0) {
            paused = false;
        } else {
            // continue first request in "queue"
            (pausedRequests.shift())(); // calls the request.continue function
        }
    };

    await page.setRequestInterception(true);
    page.on('request', request => {
        if (paused) {
            pausedRequests.push(() => request.continue());
        } else {
            paused = true; // pause, as we are processing a request now
            request.continue();
        }
    });

    page.on('requestfinished', async (request) => {
        const response = await request.response();

        const responseHeaders = response.headers();
        let responseBody;
        if (request.redirectChain().length === 0) {
            // body can only be access for non-redirect responses
            responseBody = await response.buffer();
        }

        const information = {
            url: request.url(),
            requestHeaders: request.headers(),
            requestPostData: request.postData(),
            responseHeaders: responseHeaders,
            responseSize: responseHeaders['content-length'],
            responseBody,
        };
        results.push(information);

        nextRequest(); // continue with next request
    });
    page.on('requestfailed', (request) => {
        // handle failed request
        nextRequest();
    });

    await page.goto('http://example.com', { timeout: 0 }); // otherwise i got Navigation timeout of 30000 ms exceeded
    console.log(results);

    await browser.close();
})();
Bennyh961
  • 85
  • 1
  • 7
  • 1
    I don't think example.com is your real site. Whatever the real site is is probably detecting you as a bot and not sending certain requests, which is a normal occurrence. I'd try all of the usual anti-bot tricks, like running headfully, adding a user-agent, stealth plugin, etc. – ggorlen Oct 26 '22 at 15:49
  • @ggolren Yea , i think you right. using anti-bot tricks is not familiar to me, can you suggest me a friendly libary to start with ? (node js) – Bennyh961 Oct 26 '22 at 18:20
  • 1
    First, I would start with `{headless: false}`, and if that doesn't work, maybe try some ideas in [this thread](https://stackoverflow.com/questions/63818869/why-does-headless-need-to-be-false-for-puppeteer-to-work) like puppeteer-extra-stealth-plugin, but it's sort of a case-by-case basis. stealth doesn't always work. Printing `console.log(await page.content())` can also help debug what the differences are. Sometimes you'll see a block message right there. – ggorlen Oct 26 '22 at 18:35

0 Answers0