3

I'm writing a Chrome Extention to manipulate pdf file so I want to get selected text in the pdf. How can I do that.

Some thing like that:

enter image description here

dinosaur
  • 169
  • 1
  • 9

2 Answers2

1

You can use the internal undocumented commands of the built-in PDF viewer.

Here's an example of a content script:

function getPdfSelectedText() {
  return new Promise(resolve => {
    window.addEventListener('message', function onMessage(e) {
      if (e.origin === 'chrome-extension://mhjfbmdgcfjbbpaeojofohoefgiehjai' &&
          e.data && e.data.type === 'getSelectedTextReply') {
        window.removeEventListener('message', onMessage);
        resolve(e.data.selectedText);
      }
    });
    // runs code in page context to access postMessage of the embedded plugin
    const script = document.createElement('script');
    if (chrome.runtime.getManifest().manifest_version > 2) {
      script.src = chrome.runtime.getURL('query-pdf.js');
    } else {
      script.textContent = `(${() => {
        document.querySelector('embed').postMessage({type: 'getSelectedText'}, '*');
      }})()`;
    }
    document.documentElement.appendChild(script);
    script.remove();
  });
}

chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => {
  if (msg === 'getPdfSelection') {
    getPdfSelectedText().then(sendResponse);
    return true;
  }
});

This example assumes you send a message from the popup or background script:

chrome.tabs.query({active: true, currentWindow: true}, ([tab]) => {
  chrome.tabs.sendMessage(tab.id, 'getPdfSelection', sel => {
    // do something
  });
});

See also How to open the correct devtools console to see output from an extension script?

ManifestV3 extensions also need this:

  • manifest.json should expose query-pdf.js

      "web_accessible_resources": [{
        "resources": ["query-pdf.js"],
        "matches": ["<all_urls>"],
        "use_dynamic_url": true
      }]
    
  • query-pdf.js

    document.querySelector('embed').postMessage({type: 'getSelectedText'}, '*')
    
wOxxOm
  • 65,848
  • 11
  • 132
  • 136
  • This did not work for me. The message listener did not intercept any events from the pdf viewer, unfortunately. – Yao Oct 17 '21 at 00:16
  • @AlexZhong, this is still working so if you can post a new question with an [MCVE](/help/mcve) that describes all the specifics of your case someone (or I) might be able to help. Note that this answer only works with the built-in viewer and only in the main page, so for an iframe you would need to make a couple of changes. – wOxxOm Oct 17 '21 at 05:28
  • Hey, I tried it with the built-in viewer. What I did was I copied your code in my CRX, tried in both the background and content script separately -> the message listener is registered -> I cannot observe any messages received from the listener when I select the text in the pdf viewer. – Yao Oct 17 '21 at 07:03
  • Also, I could not find any "getPdfSelection" message being sent in the source code that you linked – Yao Oct 17 '21 at 07:04
  • You are supposed to send that message yourself, of course. – wOxxOm Oct 17 '21 at 10:01
  • Is there a way to listen for a text selection and trigger it this way? – Andrew Feb 20 '23 at 17:33
  • This works for a pdf file served on the web ( https:// blabla ), but I couldn't make it work for a local file. It says: "Failed to execute 'postMessage' on 'DOMWindow': The target origin provided ('file://') does not match the recipient window's origin ('null')." (manifest file is configured as instructed) – Alperen Belgiç Jun 20 '23 at 15:03
0

There is no one generic solution for all pdf extensions. Every extention has is own API. If you work with google-chrome extension i belive it's impossible.

How to get the selected text from an embedded pdf in a web page?

How extension get the text selected in chrome pdf viewer?

General Grievance
  • 4,555
  • 31
  • 31
  • 45