1

I've been trying to get puppeteer to launch a unique instance for every profile stored in a .json file. This is because currently I am stuck creating a new folder with all my code and a unique .json file for every account/instance I want to run. I'd prefer if I could just store all my info in 1 .json file and then have my code launch a unique instance for each profile.

Goal:

  1. Input all profile information in .json file
  2. Have code launch a unique instance for every profile in the list
  3. Every unique instance should only be using the profile code

Example: Puppeter instance 1 launch with profile 1, puppeteer instance 2 launch with profile 2, etc.

Example of settings.json

[
{
    "email": "email1@gmail.com"
},
{
    "email": "email2@gmail.com"
},
{
    "email": "email3@gmail.com"
}
]

Example of main.js

const fs = require('fs');
const puppeteer = require('puppeteer');

const profile = JSON.parse(fs.readFileSync('./settings.json'));

var id = 0

while (id <= 2) {
    emailInfo = profile[id].email;
    console.log(emailInfo)
    botRun()
    id++;
}

function botRun() {
    (async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.waitForTimeout(500)
        console.log('function ' + emailInfo) //pretend this is page.type --> it would result in 'email3@gmail.com' for all instances since this is what the var is now but I want it to stay with the info in the loop

        await browser.close();
      })();
}

Obviously this is horrendously wrong since emailInfo var will update therefore resulting in puppeteer applying the latest value. Is there any way I can make each puppeteer instance stick with the unique data?

Edit 1:

Managed to get the workaround but now I seem to have ran into a new issue. Basically, in one point of my script I tell the browser to close the tab and reopen a new one. It closes each tab in each individual browser fine but when I use "await browser.newPage();" it sends all the new tabs to just 1 browser instead of staying in their respective browser.

const puppeteer = require('puppeteer-extra');
const fs = require('fs');

const botRun = async emailInfo => {
    browser = await puppeteer.launch({
        args: [],
        headless: false,
        ignoreHTTPSErrors: true,
        slowMo: 5,
    });
    const page = await browser.newPage();
    await page.waitForTimeout(2500)
    // do stuff with emailInfo
    await page.close(); // works fine - will close tab for each browser 
    await browser.newPage(); // suddenly sends all tabs to 1 browser
  };

(async () => {
    const profile = JSON.parse(fs.readFileSync("./settings.json"));    
    await Promise.all(profile.map(({email}) => botRun(email)));
  })();

Here is an image for clarification. My goal is to keep the tabs in their respective browser rather than suddenly all being thrown to 1 browser:

ggorlen
  • 44,755
  • 7
  • 76
  • 106

2 Answers2

3

Put the loop into the Puppeteer code or pass the emailInfo as a parameter to the function.

If you want to run tasks in succession:

const fs = require("fs");
const puppeteer = require("puppeteer");

(async () => {
  const profile = JSON.parse(fs.readFileSync("./settings.json"));
  const browser = await puppeteer.launch();

  for (const {email: emailInfo} of profile) {
    const page = await browser.newPage();
    await page.waitForTimeout(500)
    // do stuff with emailInfo
    await page.close();
  }

  await browser.close();
})();

If you want to run all tasks in parallel:

(async () => {
  const profile = JSON.parse(fs.readFileSync("./settings.json"));
  const browser = await puppeteer.launch();

  await Promise.all(profile.map(async ({email: emailInfo}) => {
    const page = await browser.newPage();
    await page.waitForTimeout(500)
    // do stuff with emailInfo
    await page.close();
  }));

  await browser.close();
})();

If // do stuff with emailInfo is a very long chunk of code, use a function (as you're attempting originally) and give it emailInfo as a parameter. This most closely matches what you were originally going for (open a new browser per email):

const botRun = async emailInfo => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.waitForTimeout(500)
  // do stuff with emailInfo
  await browser.close();
};

(async () => {
  const profile = JSON.parse(fs.readFileSync("./settings.json"));

  for (const {email} of profile) {
    await botRun(email); // one at a time
  }
})();

Or run all emails at once:

// botRun is the same as above

(async () => {
  const profile = JSON.parse(fs.readFileSync("./settings.json"));    
  await Promise.all(profile.map(({email}) => botRun(email)));
})();

The semantics of the first two snippets are a bit different than your code, but I doubt it makes sense to generate and destroy a whole browser process for every request. Prefer opening a new page (tab) in the current browser unless you have a good reason to do otherwise.

Also, neither pattern is great if you have large inputs--the sequential approach is likely too slow, the parallel approach is likely too fast (opening 4000 browsers at once isn't fun). Consider a task queue for such cases so you can do some parallel work but keep it bound to a sensible degree. puppeteer-cluster has such a task queue.

jfriend00 has ticked most of the critical points beyond this (avoid globals, avoid var, etc) but I'd also like to add that you almost never need loops with counter variables. If you do use a loop with counter, prefer for loops to while. Loops with counters are verbose and tend to lead to bugs associated with off-by-one errors. JS offers many iteration abstractions like map, forEach and for..of loops that are clean, semantic and less error-prone.

Also, the above code omits error handling, but try-catch is pretty much essential when calling Puppeteer functions that can time out. You don't want to crash your app ungracefully if an operation takes a little longer than you expect or a server is down. Use a finally block to ensure you call browser.close().

Finally, page.waitForTimeout is deprecated and will be removed in future Puppeteer versions. There are better ways to delay your script until a condition is met. See puppeteer: wait N seconds before continuing to the next line for further discussion.

See also Crawling multiple URLs in a loop using Puppeteer.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • Thank you for the response this is exactly what I needed! I actually do have to run separate browser instances for anti-bot measures. My code also clears cache and cookies so I don't want this interfering with the other instances. Would there be a particular workaround for separate browsers altogether? – TheCuriousMarketer Jan 30 '21 at 00:33
  • Sure, do you want the requests to run one at a time or all at once, or `<= n` at once? How many emails do you have here? Basically, all you have to do is move the `await puppeteer.launch()` and `browser.close()` into the loop, so it's a trivial change, but if you have 4000 emails, generating 4000 browsers or pages (the `Promise.all` version) is going to be not fun for your computer. On the other hand, you probably don't want to go one at a time either, so I'd recommend a task queue of some kind so you can do, say, 4-5 at at time. I'm just speculating, maybe you only have 3 emails. – ggorlen Jan 30 '21 at 00:34
  • Yeah so I won't be loading anymore than 10-20 profile but I still think I will setup the queue to minimize load. Also, 1 other thing. If I wanted to also store the login password and then have it called in order as well (like the email) how could I add that in? Also thank you for the clarification on the loops and error handling. – TheCuriousMarketer Jan 30 '21 at 00:54
  • No problem. If you have more properties, just add them in to the loop [destructuring assignment](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment): `for (const {email, password, somethingElse} of profile) {`. If you're using the function version, add more parameters or pass the whole object in and pull the properties out as needed. If you aren't comfortable with destructuring, you can do `for (const user of profiles) { const email = user.email; /* ... etc ...*/ }` For 10-20 profiles start with the one-at-a-time version first, then upgrade. – ggorlen Jan 30 '21 at 00:59
  • That's perfect! The one issue I seem to be getting now is that some of my commands spill over into the other browsers. For example if you add in page.close() to the end and then open a new tab all the tabs will be opened on 1 browser. Any workaround for this? – TheCuriousMarketer Jan 30 '21 at 02:03
  • The top 2 examples all open pages in a single browser. The bottom 2 examples open each page in its own browser. – ggorlen Jan 30 '21 at 03:09
  • If you could possibly refer to my latest edit (edit 1) on the thread that would be appreciated. Sorry, it was a bit difficult to fully covey what I was trying to explain in this tiny box. – TheCuriousMarketer Jan 30 '21 at 04:44
  • I'm not actually sure offhand, I'd have to dig in. I usually operate in a single browser. Since the issue is pretty much totally distinct from iterating an array and calling a function which was the original problem, I'd roll back your edit and open a new question. [Accepting an answer](https://stackoverflow.com/help/someone-answers) to the original question (assuming it fixes the problem) and opening a new one is the general protocol, otherwise existing answers become stale and the thread is less focused and useful for future visitors. – ggorlen Jan 30 '21 at 05:11
  • Although my above comment stands, `await browser.newPage();` doesn't seem to make sense here. You already closed the browser before running this line. What are you trying to achieve with this? Also, did you try the "one at a time" serial approach--what happens with the tabs then? Feel free to link me to your next question once this question is closed. – ggorlen Jan 30 '21 at 05:13
  • So all of my code is in a nested loop. I have try catch errors setup which close the page and then reset the loop back to the top. The starting loop within the nested loop is where my browser tab opens. So when it ends up going through the loop again it sends the new browser tabs to the last opened browser rather than its respective browser – TheCuriousMarketer Jan 30 '21 at 05:41
1

In a nutshell, don't let asynchronous operations running in parallel use higher scoped variables that are "shared". That is the crux of your problem as you have a loop of asynchronous operations attempting to all use the emailInfo variable so they will stomp on each other.

Don't make emailInfo be a higher scoped variable like you are (actually, even worse, you weren't declaring it at all which made it an implicit global - very bad). Pass it as a function argument into the specific functions you want to use it in or declare it with let within the scope you want to use it in. Then, it will have separate values in each place it is being used. Your problem is that you have one variable and a number of asynchronous things all trying to use it. That will always cause a problem in Javascript.

Also, don't use var any more. Use let or const. Both of those are blocked-scoped rather than function scoped so you can more finely control what their scope is. You can always declare a variable with let at the top of a function if you really want a function scoped variable.

If the real problem you're trying to solve is that you want to use emailInfo inside of botRun(), then just pass in that value:

const fs = require('fs');
const puppeteer = require('puppeteer');

const profile = JSON.parse(fs.readFileSync('./settings.json'));

let id = 0;

while (id <= 2) {
    console.log(profile[id].email);
    botRun(profile[id].email);
    id++;
}

async function botRun(emailInfo) {
    let browser;
    try {
        browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.waitForTimeout(500);
        console.log('function ' + emailInfo);
    } catch(e) {
        console.log(e);
        // decide what you're doing upon errors here
    } finally {
        if (browser) {
            await browser.close();
        }
    }
}

Also, no need for the extra function inside of botRun(). You can just make botRun() be async and that will work fine. And, you need some proper error handling if any of the await statements encounters a rejected promise.

jfriend00
  • 683,504
  • 96
  • 985
  • 979