I want to build a chrome extension for personal use. The extension will scrape some webpages and it will render some information. So I think puppeteer can help me with that. I understand that I need to run node inside a chrome extension. Is it possible? I have found some answers but they are old.
-
1No, extensions can't do that. Extensions can use standard API like XHR/fetch/DOMParser to scrape the web. There might be some existing libraries for that. If you really want node/Puppetteer you can write a separate app and invoke it from your extension via [nativeMesaging](https://developer.chrome.com/extensions/nativeMessaging). – wOxxOm Mar 15 '19 at 14:01
-
I need to scrape some lazyloaded pages, so simple XHR fetch will not work. – Petran Mar 15 '19 at 14:13
-
You can use a content script with `"all_frames": true` and iframe in your background script (and override X-Frame-Options with webRequest) or an inactive tab (or a new minimized browser window). I think I've seen answers that explain all this in detail. – wOxxOm Mar 15 '19 at 14:16
-
Related: [How to run Puppeteer code in any web browser?](https://stackoverflow.com/questions/54647694/how-to-run-puppeteer-code-in-any-web-browser) – ggorlen Feb 08 '23 at 20:13
2 Answers
I know this is 9 months late but I had the same use case at work on Window machines but you can make it work with Mac.
The trick is to use puppeteer-web https://github.com/puppeteer/puppeteer/tree/master/utils/browser#bundling-for-web-browsers
Bundle the repository and place it in your chrome extension folder and then reference it in your popup.html
with something like
<script src="./puppeteer/utils/browser/puppeteer-web.js"></script>
You'll then need to take advantage of Chrome's remote debugging functionality as puppeteer-web can't start its own instance via puppeteer.launch()
and can only use puppeteer.connect()
to connect to an already existing chrome instance.
On windows add --remote-debugging-port=9222
to the end of the target field of the chrome short cut as per How to make Chrome always launch with remote-debugging-port flag
Or on Mac /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir=$(mktemp -d -t 'chrome-remote_data_dir')
Once remote debugging is activated you'll be able to see the webSocketDebuggerUrl
property by visiting http://127.0.0.1:9222/json/version
on your browser. This is the browserWSEndpoint
the connect method will invoke.
You will also need to add the port address to the permissions
array in the manifest.json
file otherwise ajax requests won't work in the chrome extension.
Eg:
"permissions": [ "tabs" , "identity", "http://127.0.0.1:9222/*"],
Example popup.html
file
<!DOCTYPE html>
<html>
<head>
<title>Example popup</title>
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body>
<div>
<button id='puppeteer-button'>Do Puppeteer Things</button>
<script src="./puppeteer/utils/browser/puppeteer-web.js"></script>
<script type="module" src="popup.js"></script>
</div>
</body>
</html>
Example popup.js
file
let browserWSEndpoint = '';
const puppeteer = require("puppeteer");
async function initiatePuppeteer() {
await fetch("http://127.0.0.1:9222/json/version")
.then(response => response.json())
.then(function(data) {
browserWSEndpoint = data.webSocketDebuggerUrl;
})
.catch(error => console.log(error));
}
initiatePuppeteer();
// Assign button to puppeteer function
document
.getElementById("puppeteer-button")
.addEventListener("click", doPuppeteerThings);
async function doPuppeteerThings() {
const browser = await puppeteer.connect({
browserWSEndpoint: browserWSEndpoint
});
const page = await browser.newPage();
// Your puppeteer code goes here
}
Hope that helps, I haven't had any issues by appending remote debugging to my target field on my work window machines, despite feeling a bit hacky. I wrote a short blog post on it with better syntax highlighting here.

- 751
- 7
- 19
-
1Can this be shared on Chrome websotore and used without enabling the remote debugging? – Kaushik Ray Jan 14 '21 at 15:37
-
1This has a race condition: if the user clicks `#puppeteer-button` before the `fetch` resolves `browserWSEndpoint` will be an empty string. Best to use a promise instead of `browserWSEndpoint = data.webSocketDebuggerUrl;` so the button can `await` the correct `browserWSEndpoint` value. Also, the top link is dead. – ggorlen Jul 24 '21 at 00:28
-
I needed to add `-- "%1"` in target so it looks like this: `--remote-debugging-port=9222 -- "%1"` – milos Nov 10 '21 at 21:25
-
1Seems like puppeteer-web isnt supported anymore, is there a new way of implementing this? – user303749 Apr 05 '22 at 16:38
Actually it is possible but with some limitations. Puppeteer use devtools-protocol (https://chromedevtools.github.io/devtools-protocol/) which is available inside chrome extension when you enable deubgger in your extension manifest https://developer.chrome.com/extensions/debugger. But inside extension is available only latest, stable version of protocol (for now is 1.3 https://chromedevtools.github.io/devtools-protocol/1-3).
But in my opinion you don't need devtools-protocol to handle your problem. Just use standard extension API https://developer.chrome.com/extensions/api_index to open any URL you need (chrome.tabs.update), parse page inside content.js and do with that data whatever you want.

- 1,387
- 10
- 17