5

I'm writing some test code that controls Chrome using Chrome DevTools Protocol. After opening a web page, I need to get the images from the page. It's easy enough to get the image URLs, but I want to get the actual images from Chrome without re-downloading them. This will help the tests run a lot faster due to the limited bandwidth between the test client and the web server. It will also help the test to simulate a more realistic interaction with the web server.

Is there a way to get the images using Chrome DevTools Protocol? I suppose I could take a screenshot of each image, but I'd prefer to get unaltered images. Or, is there a way to access the images from a script that gets injected into the browser?

Kaiido
  • 123,334
  • 13
  • 219
  • 285
mrog
  • 1,930
  • 3
  • 21
  • 28
  • I'm not familiar with the Chrome DevTools Protocol, but it looks like you'll want to use the [DOM](https://chromedevtools.github.io/devtools-protocol/tot/DOM) API with a slightly modified implementation of [this](https://stackoverflow.com/questions/934012/get-image-data-in-javascript) answer. – Jake Holzinger Mar 26 '19 at 22:49
  • I'd say it's one of the many similar questions asking how to read a response. Apparently, you could do it using [Network.getReponseBody()](https://chromedevtools.github.io/devtools-protocol/tot/Network#method-getResponseBody) though I never used it, nor know how to use it. – Kaiido Mar 27 '19 at 01:42
  • I solved this problem here https://stackoverflow.com/a/58744482/1919821 – pery mimon Nov 07 '19 at 08:20
  • @perymimon If I understand your approach correctly, you're getting the img tags, right? I need the actual image file content. – mrog Nov 07 '19 at 20:35
  • I find all url of images that the page load. if you just need all images element in the page use `document.images` . for the data of the image i guess you can use `canvas` – pery mimon Nov 10 '19 at 15:20

2 Answers2

2

I think you could use Puppeteer for this:

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

Here's a script that access this very question and saves all images to your disk as soon as they are loaded:

const {writeFileSync} = require('fs');
const puppeteer = require('puppeteer');

let counter = 0;

const whenImage = fn => async res => {
  const contentType = res.headers()['content-type']
  if (contentType && contentType.startsWith('image/')) {
    fn(await res.buffer());
  }
};

puppeteer.launch().then(async browser => {
  const page = await browser.newPage();
  page.on('response', whenImage(content => writeFileSync(`img_${counter++}`, content)));
  await page.goto('https://stackoverflow.com/q/55366906/1244884');
  await browser.close();
});
customcommander
  • 17,580
  • 5
  • 58
  • 84
0

I usually convert the image to Base64 using the following JavaScript function.

function toBase64Uri(img){
    let canvas = document.createElement("canvas");

    canvas.height = img.naturalHeight;
    canvas.width = img.naturalWidth;

    canvas.getContext("2d").drawImage(img, 0, 0);

    return canvas.toDataURL();
}

Optionally, you can convert the canvas element to an image directly by calling canvas.toDataURL (msdn refrence).

Kartik Soneji
  • 1,066
  • 1
  • 13
  • 25
  • The question is asking about how to do this through the Chrome DevTools protocol, not on a web page. –  Mar 26 '19 at 23:37
  • 1
    @duskwuff Yes, but OP also asked if there a way to access the images from a script that gets injected into the browser. – Kartik Soneji Mar 27 '19 at 10:52
  • I agree with @KartikSoneji. It was my intent to use this approach if it couldn't be done using CDP. I'm upvoting this answer, even though I'm accepting the other one. – mrog Mar 27 '19 at 16:49