Getting full page source information using PowerShell

Question

So my ultimate problem is that I have a SharePoint list where each list item may have multiple image attachments. I am looking to scrape the list using PowerShell so that I can backup all the images.

I am able to access each item's page in the list because of similarities in the URL, but I am unable to extract the attachments. This is because the filename is non-determinant. Unfortunately, I don't seem to be able to parse the info with Invoke-WebRequest because it brings back the HTML of the page, which does not list the file attachments.

Instead, the file attachments can be viewed when you use the 'Inspect page source' button, and which I believe is because they are inside a JavaScript function.

So, my question is - Can I get each file in a page's attachment from the JavaScript function so that I can scrape the page? Also - am I interpreting this problem correctly, and are there any other ways to solve this problem?

Please note: I don't have access to SharePoint server dlls including Microsoft.Sharepoint.dll, so I can't use Classes from that dll (unless they might be easily imported without having to install the whole library).

Here is a photo of where the source changes. I believe this is where HTML ends and Javascript begins:

And the highlighted lines in this file shows the information that I am looking to parse from the page's source information so that I can form the URL to download the image attachments:

I would use Invoke-Webrequest and parse the HTML code with regex to extract "DSC*.JPG" filenames and again Invoke-Webrequest to get the images finally. — f6a4, Jun 01 '18 at 05:00
That is the idea, but when using Invoke-WebRequest, only the HTML (the colored text ending in the first picture above) is loaded. I guess this non-colored text I get (2nd picture, from selecting right click -> view page source) is some backend JavaScript function. This is the text that has the filenames I can parse. But I don't know how to get at them programmatically. Note also, that the image names will not always be DSCN*.JPG, but parsing isn't the problem. — Free Url, Jun 01 '18 at 13:54
Perhaps it is a timing issue like over here: https://stackoverflow.com/questions/17049203/powershell-download-or-save-source-code-for-whole-ie-page will evaluate and advise this page asap. — Free Url, Jun 01 '18 at 13:57

Getting full page source information using PowerShell

0 Answers0