View XML from PDF in Embedded

Question

I'm working on a browser extension that needs to read the data in a pdf file that pops up.

When the popup comes up and I go to inspect, I only find the following information:

<embed id="plugin" type="application/x-google-chrome-pdf" src="https://thisisnottherealurl-soignorethis part......../something.aspx" stream-url="chrome-extension://xxxxxx/xxxx" headers="cache-control: no-cache, no-store,must-revalidate
content-type: application/pdf
date: Wed, 03 Mar 1999 15:31:26 GMT
expires: -1
pragma: no-cache
server: Microsoft-IIS/10.0
x-aspnet-version: 4.0.30319
x-powered-by: ASP.NET
" background-color="0xFF525659" top-toolbar-height="0" javascript="allow" full-frame="" pdf-viewer-update-enabled="">

I know for a fact that the information is in XML format, and I am certain that it is found in the embed tag. I can view it by changing the settings to 'save' the file rather than to view it. What I cannot seem to find, neither in the Network information nor the Source, is where that information is at nor how I can have the browser extension go through it for me.

The only way to access the built-in PDF viewer without redownloading the file is its undocumented API, see [How can I get selected text in pdf in Javascript?](https://stackoverflow.com/a/61076939) — wOxxOm, Mar 03 '21 at 16:39

Steve S · Answer 1 · 2021-03-03T19:53:50.020

For anyone else interested in this method, I found some interesting work arounds.

Apparently the pdf documents create a dynamic extension and uses Chrome APIs inside of the browser which appears to run the code for making the pdf.

This makes is somewhat more difficult than usual to get a look at the network traffic and the processes.

An interesting work around, aside from the above comment, that I had found is that the pdf document can be selected and cut/pasted into clipboard, or even into a variable.

After some testing, I found that my browser extension does have capability in the new pdf window. Thus I was able to extract the information that way.

This isn't exactly what I had been looking for, but I found it to be quite interesting and thought someone else could use it.

Remember to take into account asynchronous running of the code.

The code for select/copy that I generally use is:

let sel = window.getSelection(), range = document.createRange(); range.selectNodeContents(document.documentElement);    
sel.removeAllRanges(); 
let textStuff = sel.addRange(range); 
sel.removeAllRanges();

Problem is however that it appears that the pdf document might actually be embedded in the css, thus avoiding the usual method of copy/paste from the dom.

If the copy/paste doesn't work for you, I also found a somewhat interesting method of simulating the copy paste at:

How to implement ctrl click behavior to copy text from an embedded pdf in a webapp?

View XML from PDF in Embedded

1 Answers1