Print/Save PDF Embedded in web page from Powershell

Question

I'm trying to figure out how I can use powershell to automatically print multiple PDF pages when run. The biggest problem is that there is up to 700 pages on the pdf but, when viewing the PDF in the web browser, It only allows you to view, save, and print one page at a time. On the left hand side there are Hyperlink buttons for all of the pages and I haven't found a way yet to view more than one.

I was thinking I would be able to loop through all of the pages since the only difference in the url is the page number. But I am having trouble being able to print. I started to try to save the .html file as a word doc or a pdf thinking it would be easier to print from a different file type but this isn't working to correctly save the entire page. The code I have here is strictly to try and save the .html file to either a .docx or .pdf file format, I need help with doing this. I am able to actually create a file with the filename specified but it doesn't contain any data

$client = New-Object System.Net.WebClient
$client.DownloadString("http://website.com/PublicationLink/9c5eafdc-4a61-430c-b7f7-a6ddbffd175a/9803-3600U_1.html")

$code = '$helper = New-Object -comobject WScript.Shell; 
$helper.AppActivate(''Save HTML Document'', $true);     
$helper.SendKeys(''{ENTER}'')'

$ie = New-Object -ComObject InternetExplorer.Application
$ie.Navigate("http://website.PublicationLink/9c5eafdc-4a61-430c-b7f7-a6ddbffd175a/9803-3600U_1.html") 

while ($ie.ReadyState -ne 4) { Start-Sleep -Milliseconds 200} Start-Process powershell.exe -argument ('-version 2.0 -noprofile -windowstyle hidden -command "{0}"' -f $code)

$ie.ExecWB(4,2,"page.txt",[ref]$null)

the other issue could be that to get to this page it requires authentication so I am not sure what I would need to do to handle that either.

here is a screenshot of the page if it helps enter image description here

and here is the source code that I grabbed off of the developer tools for the pdf .html

<HTML>
<HEAD>
<TITLE>
9801-7868_1
</TITLE>
<script language="javascript" src="scripts\page.js"></script>
</HEAD>
<BODY style="padding:0;margin:0;overflow-x:auto;overflow-y:hidden;"      onload="setpagenum(1)">
<div id="pdfSection">
<object id="pdfObject" width="100%" height="100%"  align="top" classid="clsid:CA8A9780-280D-11CF-A24D-444553540000">
<param name="SRC" value="9801-7868_1.pdf">
</object></div>
</BODY>
</HTML>

UPDATE: I am able to actually get the file to appear like it is downloaded as a .pdf with this code below but when I try to open it in adobe, it throws an error saying it is either not a supported file type or it is damaged.

$Url = "https://spp.jdsportal.jcb.com/PublicationLink/4f67dea0-4164-4b23-9ac3-29acfb3a5e7b/9801-7868_1.pdf"
$Path = "C:\Users\Administrator\Documents\manual2.pdf"
$Username = "User"
$Password = "Pass"

$WebClient = New-Object System.Net.WebClient
$WebClient.Credentials = New-Object System.Net.Networkcredential($Username, $Password)
$WebClient.DownloadFile( $url, $path )

Are we not able to see the actual website? That would make troubleshooting a lot easier. Just saw the last line.... so that would be a no then. I didn't see any authentication in the code so I was curious. — Matt, Jun 30 '15 at 12:20
Yes sorry, I was trying to at least get a screenshot of the webpage (which might not help anyways) but at the moment i am not able to load the page. — user3841709, Jun 30 '15 at 12:25
maybe using [selenium](http://www.seleniumhq.org/) or [internet explorer com object](http://stackoverflow.com/questions/29209227/using-internetexplorer-object-what-is-the-correct-way-to-wait-for-an-ajax-respon) or external tool such as autoit would be simplier, — Loïc MICHEL, Jun 30 '15 at 12:25
I will look into that option a little. Just quickly before I look into it much, Does selenium allow me to specify exact pages to load and the automate the printing or saving? — user3841709, Jun 30 '15 at 12:27
I have not used it personaly but their site states "Primarily, it is for automating web applications " — Loïc MICHEL, Jun 30 '15 at 12:30
Ok, thanks that could be pretty helpful, I would still like to try to get this working in powershell for now also — user3841709, Jun 30 '15 at 12:31
That will be hard since we cant see the page or its source to know the mechanics of the pdf viewing play out. — Matt, Jun 30 '15 at 12:39
Ok, I will work on at least getting some of the source code and posting on my original question. That should help out shouldn't it? — user3841709, Jun 30 '15 at 12:41
my guess is that it's driven by javascript, there are possibilities to interract using ie com object but the code become unreadable quickly...one question I wrote where you can find some hints : http://stackoverflow.com/questions/29209227/using-internetexplorer-object-what-is-the-correct-way-to-wait-for-an-ajax-respon good luck ! — Loïc MICHEL, Jun 30 '15 at 12:53
Ok I will have a look at this question. One other thing I wanted to ask, Is there anyway to compile all of these into one pdf that I can view by changing the code in developer tools? It seems like these are all individual .pdf files though — user3841709, Jun 30 '15 at 12:57

Print/Save PDF Embedded in web page from Powershell

0 Answers0