59

google finds my browser is being manipulated/controlled/automated by software, and because of that I get reCaptcha. When I manual start chromium and do the same steps the reCaptcha doesn't appear.

Question 1)

Is it possible to solve captcha Programmatically or get rid of it when using puppeteer? Any way to solve this?

Question 2)

Does this happens only when without headless option i.e

const browser = await puppeteer.launch({
  headless: false
})

OR this is something the fact we have to accept and move on?

rinold simon
  • 2,782
  • 4
  • 20
  • 39
  • Check out this blogpost. It is close to your own situation. https://medium.com/@jsoverson/bypassing-captchas-with-headless-chrome-93f294518337 – Paula Livingstone Apr 14 '19 at 17:48
  • I already came across that blog. He uses `2captcha` which is not FREE :P – rinold simon Apr 14 '19 at 17:52
  • 2
    Your accepted answer is PAID service from 2captcha.com. If you want to pay then why use Headless Chrome + Puppeteer? Why dont you just use `CURL` ? – Cyborg Jan 25 '20 at 23:08

4 Answers4

75

Try generating random useragent using this npm package. This usually solves the user agent-based protection.

In puppeteer pages can override browser user agent with page.setUserAgent

var userAgent = require('user-agents');
...
await page.setUserAgent(userAgent.random().toString())

Additionally, you can add these two extra plugins,

puppeteer-extra-plugin-recaptcha - Solves reCAPTCHAs automatically, using a single line of code: page.solveRecaptchas()

NOTE: puppeteer-extra-plugin-recaptcha uses a paid service 2captcha

puppeteer-extra-plugin-stealth - Applies various evasion techniques to make detection of headless puppeteer harder.

Jakub Kukul
  • 12,032
  • 3
  • 54
  • 53
rinold simon
  • 2,782
  • 4
  • 20
  • 39
  • 4
    Yes. It's not just one line of code. You also need to sign up for the service and pay for every captcha sloved – Tim Kozak Nov 11 '20 at 11:08
58

Here is a list of things I'm doing to bypass the captchas and similar blockings:

  • Enable stealth mode (via puppeteer-extra-plugin-stealth)
  • Randomize User-agent or Set a valid one (via random-useragent)
  • Randomize Viewport size
  • Skip images/styles/fonts loading for better performance
  • Pass "WebDriver check"
  • Pass "Chrome check"
  • Pass "Notifications check"
  • Pass "Plugins check"
  • Pass "Languages check"

Link to full code is here

const randomUseragent = require('random-useragent');

//Enable stealth mode
const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())

const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';

async function createPage (browser,url) {

    //Randomize User agent or Set a valid one
    const userAgent = randomUseragent.getRandom();
    const UA = userAgent || USER_AGENT;
    const page = await browser.newPage();

    //Randomize viewport size
    await page.setViewport({
        width: 1920 + Math.floor(Math.random() * 100),
        height: 3000 + Math.floor(Math.random() * 100),
        deviceScaleFactor: 1,
        hasTouch: false,
        isLandscape: false,
        isMobile: false,
    });

    await page.setUserAgent(UA);
    await page.setJavaScriptEnabled(true);
    await page.setDefaultNavigationTimeout(0);

    //Skip images/styles/fonts loading for performance
    await page.setRequestInterception(true);
    page.on('request', (req) => {
        if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
            req.abort();
        } else {
            req.continue();
        }
    });

    await page.evaluateOnNewDocument(() => {
        // Pass webdriver check
        Object.defineProperty(navigator, 'webdriver', {
            get: () => false,
        });
    });

    await page.evaluateOnNewDocument(() => {
        // Pass chrome check
        window.chrome = {
            runtime: {},
            // etc.
        };
    });

    await page.evaluateOnNewDocument(() => {
        //Pass notifications check
        const originalQuery = window.navigator.permissions.query;
        return window.navigator.permissions.query = (parameters) => (
            parameters.name === 'notifications' ?
                Promise.resolve({ state: Notification.permission }) :
                originalQuery(parameters)
        );
    });

    await page.evaluateOnNewDocument(() => {
        // Overwrite the `plugins` property to use a custom getter.
        Object.defineProperty(navigator, 'plugins', {
            // This just needs to have `length > 0` for the current test,
            // but we could mock the plugins too if necessary.
            get: () => [1, 2, 3, 4, 5],
        });
    });

    await page.evaluateOnNewDocument(() => {
        // Overwrite the `languages` property to use a custom getter.
        Object.defineProperty(navigator, 'languages', {
            get: () => ['en-US', 'en'],
        });
    });

    await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } );
    return page;
}
Josh Correia
  • 3,807
  • 3
  • 33
  • 50
Tim Kozak
  • 4,026
  • 39
  • 44
9

Have you tried setting the browser agent?

await page.setUserAgent('5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');
Hellonearthis
  • 1,664
  • 1
  • 18
  • 26
  • No. Will give it a try. But what happens by having the same UserAgent? Isn't the `UserAgent` to be `random`? Could you brief it? – rinold simon Apr 15 '19 at 09:06
  • Having the agent as default says you are using puppeteer, so setting it to chrome (like above) gets you past the basic test. But you will still end up with the captcha at some time. If you login it might also help keep ya scraper working for a bit. – Hellonearthis Apr 16 '19 at 10:16
  • 7
    Even after setting specific useragent we end up with the captcha after some logins. so I tried generating random useragent each time using npm package (https://www.npmjs.com/package/random-useragent) . Now it working fine. – rinold simon Apr 16 '19 at 11:53
  • 1
    This answer is great, but the user agent is quite old (69.0.3497 is a few months old by the time I'm writing this answer), the latest one by now is this: ``` Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.89 Safari/537.36 ``` [https://developers.whatismybrowser.com/useragents/explore/software_name/brave/](link) if you want other versions – Dr4kk0nnys May 26 '21 at 15:17
8

After a few tests, a couple of packages helped me avoid recaptcha:

//const puppeteer = require('puppeteer');
const puppeteerExtra = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
const randomUseragent = require('random-useragent');

class PuppeteerService {

    constructor() {
        this.browser = null;
        this.page = null;
        this.pageOptions = null;
        this.waitForFunction = null;
        this.isLinkCrawlTest = null;
    }

    async initiate(countsLimitsData, isLinkCrawlTest) {
        this.pageOptions = {
            waitUntil: 'networkidle2',
            timeout: countsLimitsData.millisecondsTimeoutSourceRequestCount
        };
        this.waitForFunction = 'document.querySelector("body")';
        puppeteerExtra.use(pluginStealth());
        //const browser = await puppeteerExtra.launch({ headless: false });
        this.browser = await puppeteerExtra.launch({ headless: false });
        this.page = await this.browser.newPage();
        await this.page.setRequestInterception(true);
        this.page.on('request', (request) => {
            if (['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
                request.abort();
            } else {
                request.continue();
            }
        });
        this.isLinkCrawlTest = isLinkCrawlTest;
    }

    async crawl(link) {
        const userAgent = randomUseragent.getRandom();
        const crawlResults = { isValidPage: true, pageSource: null };
        try {
            await this.page.setUserAgent(userAgent);
            await this.page.goto(link, this.pageOptions);
            await this.page.waitForFunction(this.waitForFunction);
            crawlResults.pageSource = await this.page.content();
        }
        catch (error) {
            crawlResults.isValidPage = false;
        }
        if (this.isLinkCrawlTest) {
            this.close();
        }
        return crawlResults;
    }

    close() {
        if (!this.browser) {
            this.browser.close();
        }
    }
}

const puppeteerService = new PuppeteerService();
module.exports = puppeteerService;
Or Assayag
  • 5,662
  • 13
  • 57
  • 93