1

I'm writing a Chrome extension that will add some information to a page, based on content from another webpage (on another domain).

This is done by asking the background service worker to load the page, then pass the relevant information back to the content script.

It's important that the page is loaded as if the user opened a tab with that page, so it'll use the user's login and other context (locale, etc).

content.js:

chrome.runtime.sendMessage('some-query', function(response) {
  console.log(response)
})

background.js:

chrome.runtime.onMessage.addListener((message, sender, reply) => {
  loadInBackground(message, reply)
  return true
})

function loadInBackground(query, reply) {
  console.log('searching for', query)
  let url = new URL("https://some.site.com/search")
  url.searchParams.append("query", query)
  // How to load the page?
  reply('data-from-loaded-page')
}

I've tried using fetch, but that gives a CORS error and even if it didn't, I don't expect it to use the state as if loaded in a tab. So, what hidden Chrome API can I use to do this? I've looked at all of these but I can't find the API I need.

Jorn
  • 20,612
  • 18
  • 79
  • 126

1 Answers1

2

To use fetch in an extension script, add that URL or "<all_urls>" to host permissions.

But there's a bigger problem: the site's scripts won't run with fetch, so there'll be nothing useful to extract on many modern pages that construct themselves dynamically.

The only workaround for extensions in such cases is to open that site in a new tab or iframe because Chrome extensions don't have a WebView like Chrome Apps did, so extensions can't fully imitate a normal page load process. Note that Chrome Apps will soon cease to exist in Chrome.

The problem with opening the site in a new tab is that it will be visible to the user. You can reduce the annoyance by opening a new non-focused window (chrome.windows.create), but that will also produce some visual effect for many users depending on their desktop environment.

Thus, the only inconspicuous workaround is to create an iframe pointing to the remote page.

  1. Create a web_accessible_resources-exposed iframe in the original web page because MV3 extensions don't have DOM in the background script. You won't need a background script at all for this task. The iframe can be "hidden" from the page inside a closed ShadowDOM. This iframe will have full access to all granted chrome API.

  2. Optionally register a DNR rule to strip the deny header.

  3. Add a child iframe with src pointing to the site you want to fetch.

  4. Extract the data using executeScript with a frameId obtained via webRequest.

  5. Send the data to the main content script.

manifest.json:

  "host_permissions": ["<all_urls>"],
  "permissions": ["declarativeNetRequestWithHostAccess", "scripting", "webRequest"],
  "web_accessible_resources": [{
    "resources": ["iframer.html"],
    "matches": ["<all_urls>"]
  }],

main content script:

(async () => {
  const data = await getRemoteSiteData('https://www.example.com', ['body']);
  console.log(data);
})();

function getRemoteSiteData(url, selectors) {
  const id = Math.random().toString(36).slice(2);
  const iframe = document.createElement('iframe');
  const el = document.createElement('div');
  const root = el.attachShadow({mode: 'closed'});
  root.appendChild(document.createElement('style')).textContent =
    ':host { display: none !important }';
  root.appendChild(iframe);
  iframe.src = chrome.runtime.getURL('iframer.html#' + id);
  document.body.appendChild(el);
  return new Promise(resolve => {
    chrome.runtime.onMessage.addListener(function _(msg, sender, sendResponse) {
      if (msg.id !== id) return;
      if (msg.init) {
        sendResponse({url, selectors, frameId: sender.frameId});
      } else {
        el.remove();
        chrome.runtime.onMessage.removeListener(_);
        resolve(msg.result);
      }
    });
  });
}

iframer.html:

<script src=iframer.js></script>

iframer.js:

(async () => {
  const id = location.hash.slice(1);
  const tabId = (await chrome.tabs.getCurrent()).id;
  const job = await chrome.tabs.sendMessage(tabId, {id, init: true});
  const iframe = document.createElement('iframe');
  let webFrameId;
  chrome.webRequest.onBeforeRequest.addListener(function _(info) {
    chrome.webRequest.onBeforeRequest.removeListener(_);
    webFrameId = info.frameId;
  }, {tabId, types: ['sub_frame']});
  iframe.src = job.url;
  document.body.appendChild(iframe);
  await new Promise(onload => Object.assign(iframe, {onload})); 
  const data = await chrome.scripting.executeScript({
    target: {tabId, frameIds: [webFrameId]},
    args: [job.selectors],
    func: selectors => selectors.map(sel =>
      Array.from(document.querySelectorAll(sel),
        el => el.textContent)),
  })
  const {result} = data.find(d => d.frameId === webFrameId);
  await chrome.tabs.sendMessage(tabId, {id, result}, {frameId: job.frameId});
})();
  • WARNING! An http:// URL cannot be loaded on a https:// site. You will see an error about mixed content being blocked in devtools console. The only workaround is to replace step 1 above with opening a new tab/window for the remote site and run a content script there, e.g. the original content script sends a message to the background script, which uses chrome.tabs.create + chrome.scripting.executeScript.

  • WARNING! Some sites use JS to stop loading in iframes by checking window == window.top.
    This check can't be spoofed if it's done in an inline script element.

wOxxOm
  • 65,848
  • 11
  • 132
  • 136