To use fetch
in an extension script, add that URL or "<all_urls>"
to host permissions.
But there's a bigger problem: the site's scripts won't run with fetch
, so there'll be nothing useful to extract on many modern pages that construct themselves dynamically.
The only workaround for extensions in such cases is to open that site in a new tab or iframe because Chrome extensions don't have a WebView like Chrome Apps did, so extensions can't fully imitate a normal page load process. Note that Chrome Apps will soon cease to exist in Chrome.
The problem with opening the site in a new tab is that it will be visible to the user. You can reduce the annoyance by opening a new non-focused window (chrome.windows.create), but that will also produce some visual effect for many users depending on their desktop environment.
Thus, the only inconspicuous workaround is to create an iframe pointing to the remote page.
Create a web_accessible_resources-exposed iframe
in the original web page because MV3 extensions don't have DOM in the background script. You won't need a background script at all for this task. The iframe can be "hidden" from the page inside a closed ShadowDOM. This iframe will have full access to all granted chrome
API.
Optionally register a DNR rule to strip the deny
header.
Add a child iframe
with src
pointing to the site you want to fetch.
Extract the data using executeScript with a frameId obtained via webRequest.
Send the data to the main content script.
manifest.json:
"host_permissions": ["<all_urls>"],
"permissions": ["declarativeNetRequestWithHostAccess", "scripting", "webRequest"],
"web_accessible_resources": [{
"resources": ["iframer.html"],
"matches": ["<all_urls>"]
}],
main content script:
(async () => {
const data = await getRemoteSiteData('https://www.example.com', ['body']);
console.log(data);
})();
function getRemoteSiteData(url, selectors) {
const id = Math.random().toString(36).slice(2);
const iframe = document.createElement('iframe');
const el = document.createElement('div');
const root = el.attachShadow({mode: 'closed'});
root.appendChild(document.createElement('style')).textContent =
':host { display: none !important }';
root.appendChild(iframe);
iframe.src = chrome.runtime.getURL('iframer.html#' + id);
document.body.appendChild(el);
return new Promise(resolve => {
chrome.runtime.onMessage.addListener(function _(msg, sender, sendResponse) {
if (msg.id !== id) return;
if (msg.init) {
sendResponse({url, selectors, frameId: sender.frameId});
} else {
el.remove();
chrome.runtime.onMessage.removeListener(_);
resolve(msg.result);
}
});
});
}
iframer.html:
<script src=iframer.js></script>
iframer.js:
(async () => {
const id = location.hash.slice(1);
const tabId = (await chrome.tabs.getCurrent()).id;
const job = await chrome.tabs.sendMessage(tabId, {id, init: true});
const iframe = document.createElement('iframe');
let webFrameId;
chrome.webRequest.onBeforeRequest.addListener(function _(info) {
chrome.webRequest.onBeforeRequest.removeListener(_);
webFrameId = info.frameId;
}, {tabId, types: ['sub_frame']});
iframe.src = job.url;
document.body.appendChild(iframe);
await new Promise(onload => Object.assign(iframe, {onload}));
const data = await chrome.scripting.executeScript({
target: {tabId, frameIds: [webFrameId]},
args: [job.selectors],
func: selectors => selectors.map(sel =>
Array.from(document.querySelectorAll(sel),
el => el.textContent)),
})
const {result} = data.find(d => d.frameId === webFrameId);
await chrome.tabs.sendMessage(tabId, {id, result}, {frameId: job.frameId});
})();
WARNING! An http://
URL cannot be loaded on a https://
site. You will see an error about mixed content being blocked in devtools console. The only workaround is to replace step 1 above with opening a new tab/window for the remote site and run a content script there, e.g. the original content script sends a message to the background script, which uses chrome.tabs.create + chrome.scripting.executeScript.
WARNING! Some sites use JS to stop loading in iframes by checking window == window.top
.
This check can't be spoofed if it's done in an inline script
element.