10

I have developped an extension to scrape some content from web page and up to now it was working fine but since I switched to manifest v3, the parsing doesn't work anymore.

I use the following script to read the source code:

chrome.scripting.executeScript( 
  {
    target: {tabId: tab.id, allFrames: true},
    files: ['GetSource.js'],
  }, async function(results) 
  {
    // GETTING HTML
    parser = new DOMParser();
    content = parser.parseFromString(results, "text/html");

... ETC ... This code used to work fine but now I get the following message in my console:

Uncaught (in promise) ReferenceError: DOMParser is not defined

The code is part of a promise but I don't think the promise is the problem here. I basically need to load the source code into a variable so that I can parse it afterwards.

I've checked the documentation but I haven't found something mentionned that DOMParser was not going to work with v3.

Any idea?

Thanks

Laurent
  • 1,465
  • 2
  • 18
  • 41
  • 3
    The background script is a service worker now so it doesn't have any DOM stuff. You'll have to load a javascript library to parse HTML or use DOMParser in a visible page of your extension e.g. in the popup. – wOxxOm Aug 28 '21 at 17:08
  • 1
    ah, that explains the problem, thanks. That's very annoying :( My pop-up contains a search field where I can enter a keyword (ex: a product) that will be searched accross multiple sites. Can I simply move my background.js scripts to popup.js? The benefit of background.js is that is was not annoying for the end user. – Laurent Aug 28 '21 at 17:24

2 Answers2

5

From the docs:

Since service workers don't have access to DOM, it's not possible for an extension's service worker to access the DOMParser API or create an to parse and traverse documents.

Using an external library just for doing what DomParser already does?? It is too heavy.

To work-around with it, we can use an offscreen document. It's just invisible webpage where you can run fetch, audio, DomParser, ... and communicate with background (service_worker) via chrome.runtime.

See an example below:

background.js

// create (load) the offscreen document (xam.html)
chrome.offscreen.createDocument({
    url: chrome.runtime.getURL('xam.html'),
    reasons: [chrome.offscreen.Reason.DOM_PARSER],
    justification: 'reason for needing the document',
});

// This is simply a test.
// It represents a scenario where, after three seconds, you want to fetch a webpage and extract HTML.
// Once the three seconds have elapsed, we send a 'broadcast' out to the listeners of our extension.
// The listener in the offscreen document will handle the job and send back us with its resulting data.

setTimeout(() => {
    const onDone = (result) => {
        console.log(result);
        chrome.runtime.onMessage.removeListener(onDone);
    };
    chrome.runtime.onMessage.addListener(onDone);
    chrome.runtime.sendMessage('from-background-page');
}, 3000);

xam.html

<html lang="en">

<head>
  <meta charset="UTF-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Document</title>
</head>

<body>
  <script src="xam.js">
  </script>
</body>

</html>

xam.js

async function main() {
    const v = await fetch('https://......dev/').then((t) => t.text());
    const d = new DOMParser().parseFromString(v, 'text/html');
    const options = Array.from(d.querySelector('select').options)
        .map((v) => `${v.value}|${v.text}`)
        .join('\n');
    chrome.runtime.sendMessage(options);
}

chrome.runtime.onMessage.addListener(async (msg) => {
    console.log(msg);
    main();
});

manifest.json

  "permissions": [
    // ...
    "offscreen"
  ]

https://developer.chrome.com/docs/extensions/reference/offscreen/

The extension's permissions carry over to offscreen documents, but extension API access is heavily limited. Currently, an offscreen document can only use the chrome.runtime APIs to send and receive messages; all other extension APIs are not exposed.

Notes:

  • I haven't tested how long this offscreen document alive.
  • Just sample codes, it should work. Customzie as your own cases.
ninhjs.dev
  • 7,203
  • 1
  • 49
  • 35
0

Since service workers don't have access to DOM, it's not possible for an extension's service worker to access the DOMParser API or create an

to parse and traverse documents.

More detail

And I solve the problem by using library dom-parser.The code could be like this

import DomParser from "dom-parser";
const parser = new DomParser();
const dom = parser.parseFromString('you html string');
Yidoon
  • 1