Questions tagged [puppeteer-cluster]

puppeteer-cluster manages a pool of headless browsers via puppeteer. This is useful to crawl multiple pages in parallel or to keep a pool of open browsers.

puppeteer-cluster creates a pool of puppeteer workers by spawning multiple browsers, contexts or pages via puppeteer. The library keeps track of queued jobs and handles thrown errors. In addition, it allows to retry jobs or introduce delays when crawling a domain.

Resources:

73 questions
6
votes
0 answers

How to improve puppeteer performance using launch args (using chromium in headless mode)?

Hi I am using puppeteer for crawling webpages(~1 Million records). For managing long crawls I am using puppeteer-cluster node module. What are the flags that are already enabled when launching chromium using puppeteer? list of args What are some…
Rajat
  • 81
  • 5
5
votes
0 answers

Open Chrome without it taking focus (Puppeteer)

I'm using Puppeteer to launch multiple browsers - Every few minutes, it'll reopen the browsers. This works fine, except it's constantly opening browsers and focusing the tabs, bothering me while I'm trying to use the computer. Due to what I'm trying…
Lawlzer
  • 75
  • 2
  • 5
4
votes
2 answers

How do I combine puppeteer plugins with puppeteer clusters?

I have a list of urls that need to be scraped from a website that uses React, for this reason I am using Puppeteer. I do not want to be blocked by anti-bot servers, for this reason I have added puppeteer-extra-plugin-stealth I want to prevent ads…
3
votes
0 answers

puppeteer: Protocol error (Runtime.callFunctionOn): Target closed

I came a across a website that puppeteer can't handle. When making screenshot, Protocol error (Runtime.callFunctionOn): Target closed or Protocol error (Emulation.setDeviceMetricsOverride): Target closed is triggered. Before taking a screenshot, I…
sanjihan
  • 5,592
  • 11
  • 54
  • 119
3
votes
0 answers

Puppeteer does not use cache when connected to proxy

I have a task that opens a browser and then visits the same page over and over again. After testing network usage I've noticed that without proxies it caches files just fine, but as soon as I am connecting to a proxy it stops caching. I am using…
3
votes
2 answers

Is Puppeteer-Cluster Stealthy enough to pass bot tests?

I wanted to know if anyone using Puppeteer-Cluster could elaborate on how the Cluster.Launch({settings}) protects against sharing of cookies and web data between pages in different context. Do the browser contexts here, actually block cookies and…
3
votes
3 answers

Puppeteer: how to wait only first response (HTML)

I'm using puppeteer-cluster to crawling web pages. If I open many pages at time per single website (8-10 pages), the connection slow down and many timeout errors coming up, like this: TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded I…
user3817605
  • 151
  • 3
  • 11
3
votes
1 answer

puppeteer-cluster: queue instead of execute

I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail…
G_V
  • 2,396
  • 29
  • 44
3
votes
2 answers

Unable to Run Multiple Node Child Processes without Choking on DigitalOcean

I've been struggling to run multiple instances of Puppeteer on DigitalOcean for quite some time with little luck. I'm able to run ~5 concurrently using tools like puppeteer-cluster, but for some reason the whole thing just chokes with little helpful…
2
votes
0 answers

how to unite cheerio with puppeteer so he can click on elements

I tried cheerio to find the element and if the element is found then he has to click but I don't know what to do with the puppeteer combination, the button I want to click is in the 3rd pict await page.waitForTimeout(10000) const contentHTML…
2
votes
1 answer

How To Passing Multiple Data in Puppeteer-Cluster

Just one question. How can i do this? I have these data : url : http://example.com and 2 string data, example : firstName and lastName The url is still the same in every browser, but, firstName and lastName will be changed every browser…
Getol99
  • 23
  • 4
2
votes
1 answer

I can't use a rotating IP proxy in my puppeteer cluster script

I am trying to run this code with multiple address ips but I think I put the proxy code in the wrong place can someone help, the proxy dashboard shows that the code uses the proxy but when he opened the browser the address IP doesn't change is still…
2
votes
1 answer

Navigation failed because browser has disconnected

I ran into the following problem. Here's the error message: Error: Navigation failed because browser has disconnected! at /Users/me/myproject/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:51:147 at…
David McNamee
  • 403
  • 4
  • 10
2
votes
1 answer

How do I reset a for loop inside an async function?

So I found a website that has very cool images and I'd like to scrape some of its data. The website didn't get any update for about 5 years and I tried to contact its owner for some kind of API and I didn't get any response back. Anyway, the website…
2
votes
3 answers

How to handle multiple tabs in puppeteer-cluster[CONCURRENCY_BROWSER]?

I'm attempting scraping for 3 urls with below conditions Each url need to run in a separate browser. The url may consist of 2 or more links to click Open the links in new tab of the respective browsers (paralleled) and switch to it and scrape the…
Ajai Ganesh
  • 21
  • 1
  • 3
1
2 3 4 5