5

I'm looking for a fast way to download all of the images that i can see in the network tab on developer tools? they come through as data:image/png;base64 links. You can open them into a new tab individually and save them manually from there but that seems to be the only way. Saving the whole webpage or a .har file dosent seem to capture them. Neither dose any addon i have tried. :/

is there a fast way to save them all? since manually doing this would take a lifetime.

Best regards, Matt

Makka
  • 234
  • 4
  • 13
  • What is the definition of "fast" at Question? Can you include result of `.har` file as text at Question? Are the images loaded within the `document`? Have you tried creating a `.zip` folder containing the images? Or using `Ctrl+s` and selecting `Webpage, Complete`? – guest271314 Jul 16 '17 at 22:36
  • Faster than doing it manually? i guess automated would have been a better word. the .har file is 76k lines long, ive been through the whole thing and i cannot find them [link](https://filebin.ca/3TcIT9jOjPll/starve.io.har). yup, tried that. [img](http://imgur.com/a/WuZoo) ss of files – Makka Jul 16 '17 at 22:50
  • _"the .har file is 76k lines long, ive been through the whole thing and i cannot find them"_ The `data URL`'s of the image files are in `.har` file, for example at lines `1230` through `1233`: `"size": 22221, "mimeType": "image/png", "text": "iV...", "encoding": "base64"` – guest271314 Jul 16 '17 at 22:55
  • Just to be clear, those are all images that are sent prior i guess, they are all saved if you use the save all method and yes, they are in the .har file. however the other 1400+ that are in the above format are not. Go to starve.io and try it for your self if you like. – Makka Jul 16 '17 at 23:13
  • What is supposed to occur at the link? – guest271314 Jul 16 '17 at 23:16
  • see, the issue is they are loaded after the main request since its a socket io app. i assume chrome is only saving the initial request? – Makka Jul 16 '17 at 23:17
  • starve.io? its a game. you can open dev tools and just load the main page to be sent all of the assets.. – Makka Jul 16 '17 at 23:18
  • Do you have an array containing the list of files which should be downloaded? – guest271314 Jul 16 '17 at 23:18
  • i do not as you cannot select all and copy in chrome dev tools. this is the issue or i could just make a python script or something to convert the data links. – Makka Jul 16 '17 at 23:19
  • Then what are you expecting? It is not clear what you are trying to achieve. If you do not have a list how do you know that there are 1400+ images which need to be downloaded? At what point are the images appended to `document`? – guest271314 Jul 16 '17 at 23:21
  • See [List file sizes of all images on a page (Chrome Extension)](https://stackoverflow.com/questions/41085017/list-file-sizes-of-all-images-on-a-page-chrome-extension/), [Multiple download links to one zip file before download javascript](https://stackoverflow.com/questions/37176397/multiple-download-links-to-one-zip-file-before-download-javascript/). Using the two approaches combined should result in a `.zip` folder containing all images in `document`. – guest271314 Jul 16 '17 at 23:21
  • ok, let me try to explain again. you can see all of the files/links in the network tab on dev tools, you can select each one - even open it in a new tab and save it. the problem is this will take forever as there is no way i can see that you can copy all of this data or export it some how for extraction. i was surprised the .har file didn't include them, it included everything up until the export on the network tab bar those. go check it out like i said if you like and you will see what i mean. – Makka Jul 16 '17 at 23:25
  • Where and when are the files visible at `DevTools`? If you can view the list you can extract and parse the list and attempt to request each URL in the parsed list. – guest271314 Jul 16 '17 at 23:26
  • as i said, it looks to be a secondary call after the page is loaded here is a SS with the time frame highlighted. http://imgur.com/a/8mL6R my guess its the preloader that's probably called after the page loads – Makka Jul 16 '17 at 23:30
  • You have a list of the files at `Network` tab at `DevTools`, as evidenced by the linked screenshot. You can request the images at `console` while you are at the same origin and create a `.zip` file containing the images. How do you know that the image files are not included in the `.har` file as `base64` strings? – guest271314 Jul 16 '17 at 23:33
  • http://www.softwareishard.com/har/viewer/ is a pretty good tool. ive also manually looked for some of the hash's. no idea how to do that tbh, the last links you provided didnt seem helpful? can you provide an example? you can try it on your end if you wish. – Makka Jul 16 '17 at 23:39
  • _"the last links you provided didnt seem helpful"_ ? _"can you provide an example?"_ The example would be an almost exact duplicate of the previously linked approaches. Use `document.images` to get a list of the images in `document`, or extract the URL's from `Network` tab at `DevTools`, request all of the images, create a `.zip` folder, append image files to `.zip` folder - omitting as yet undefined "fast" portion of inquiry. You can use `RegExp` or a loop to extract the `text` and `mimeType` from the `JSON` `.har` file, then request each resource – guest271314 Jul 16 '17 at 23:42
  • document.images only works for images that are in img tags? or for me it dose anyway. cant extract the urls from the network tab as i previously stated. this is the problem. go try it yourself if you like, you will see what i mean. – Makka Jul 16 '17 at 23:49
  • Yes, you can extract, or "Copy" the image URL's at `Network` tab – guest271314 Jul 16 '17 at 23:58
  • one at a time only. or atleest i can. if you know a way that would be helpful, or maybe provide a pastebin of them? – Makka Jul 16 '17 at 23:59
  • Was able to extract the image URL's from the `.har` file. What is the user action which triggers call to load images as `data URL`? – guest271314 Jul 17 '17 at 00:45
  • See https://stackoverflow.com/help/self-answer – guest271314 Jul 17 '17 at 14:00
  • i don't see your point? i haven't fully answered the question yet, im having a few problems with my conversion script in py, i'm getting an incorrect padding error > – Makka Jul 17 '17 at 16:15
  • all done, see my answer below. – Makka Jul 17 '17 at 17:05

2 Answers2

2

The easiest way i have found to achieve what im looking for is to: filter by images, select one of the results in the network tab, rightclick->copy->copy all as CURL(cmd). this will give you a full list of all resources you can then scrape out the data for each image and convert it to a file with a script, here is the script i made to do this:

each resource is saves as a new line as follows:

curl "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAICAYAAADED76LAAAAbklEQVQoU42PsQ3CQAADzxPAKGECRJmO9qeAEWAbOkpC9ywQVoEFOPRCNCgCXNon2Q5AOV/X6ibQAXOhYvaHflHTQvTYwE9pVimnsRKWUwBySRlGJ8OXefsKiPc/Kn6NfN/k4dbYhczaOMmu3XwCriA4HJ2kao8AAAAASUVORK5CYII=" --compressed &

Script:

import base64

fname = "starvedump.txt"
dataToBeFound = "data:image/png;base64,"
imgext = ".png"
imgpfx = "img/img_"

with open(fname) as f:
    d = f.readlines()

d[:] = [x for x in d if dataToBeFound in x]     
d = [x.replace('curl "' + dataToBeFound, '') for x in d]
d = [x.replace("\" --compressed &\n", "") for x in d]

for i, x in enumerate(d) :
    with open(imgpfx + str(i) + imgext, "wb") as fh:
        fh.write(base64.b64decode(x))
Makka
  • 234
  • 4
  • 13
1

Once the images are loaded at document you can download the .har file with content at DevTools then filter the JSON as JavaScript object to create data URL's from "mimeType", "encoding" and "text" properties of response.content properties of objects within "entries" array of "log" property of .har file.

Given linked .har file, the result would be an array having .length of 17

let imgs = json.log.entries
           .map(({response:{content:{mimeType, encoding, text}}}) => 
             /image/.test(mimeType) 
             ? `data:${mimeType};${encoding};${text}` 
             : null)
           .filter(Boolean);

jsfiddle https://jsfiddle.net/j0grexnv/

guest271314
  • 1
  • 15
  • 104
  • 177
  • This would be the correct answer if the images where saved in the .har file, but as i previously said only the images that are sent on page load are captured for some reason. How ever i seem to have found a way to capture data for all 1411 images (see my below answer). i just need to convert them now :) – Makka Jul 17 '17 at 13:46
  • @Makka _"How ever i seem to have found a way to capture data for all 1411 images (see my below answer)."_ Are you still working on posting your Answer? – guest271314 Jul 17 '17 at 13:49
  • Was not able to trigger 1400 images loading at linked document. How does that occur? – guest271314 Jul 17 '17 at 13:51
  • ive only been able to capture it with chrome, FF or IE docent work by default, haven't tired opera. if you go to the url with dev tools open you can see them getting sent after the page loads (this app uses socket io). im assuming this is why they are not captured in the .har file – Makka Jul 17 '17 at 13:54
  • you should be able to see the images im talking about above. ill upload the fill data dump for you now. – Makka Jul 17 '17 at 13:55
  • Yes, visited page, though did not view 1400+ images being requested – guest271314 Jul 17 '17 at 13:55
  • oh you have to make sure "Hide data URLs" is unchecked. that could be the issue for you? here is the dump: https://filebin.ca/3TgjAL1XgBsb/starvedump.txt – Makka Jul 17 '17 at 13:57
  • Viewing the images at screenshot is not the same as reproducing the images being requested at the site itself – guest271314 Jul 17 '17 at 13:59
  • im not sure why you cannot, as i said prior, make sure your using chrome and you have "Hide data URLs" unchecked (look in SS for it its above the timeline) that should work for you? – Makka Jul 17 '17 at 14:00
  • anyway, got this all solved and working :) thanks for your help thou! – Makka Jul 19 '17 at 18:54