Downsizing image dimensions via pure JS leads to image size inflation (in bytes)

Question

I'm a server-side dev learning the ropes of client side manipulation, starting with pure JS.

Currently I'm using pure JS to resize the dimensions of images uploaded via the browser.

I'm running into a situation where downsizing a 1018 x 1529 .jpg file to a 400 x 601 .jpeg is producing a file with a bigger size (in bytes). It goes from 70013 bytes to 74823 bytes.

My expectation is that there ought to be a size reduction, not inflation. What is going on, and is there any way to patch this kind of a situation?

Note: one point that especially perplexes me is that each image's compression starts without any prior knowledge of the target's previous compressions. Thus, any quality level below 100 should further degrade the image. This should accordingly always decrease the file size. But that strangely doesn't happen?

If required, my relevant JS code is:

var max_img_width = 400;
var wranges = [max_img_width, Math.round(0.8*max_img_width), Math.round(0.6*max_img_width),Math.round(0.4*max_img_width),Math.round(0.2*max_img_width)];

function prep_image(img_src, text, img_name, target_action, callback) { 
    var img = document.createElement('img');
    var fr = new FileReader();
    fr.onload = function(){
      var dataURL = fr.result;
      img.onload = function() {
          img_width = this.width;
          img_height = this.height;
          img_to_send = resize_and_compress(this, img_width, img_height, "image/jpeg");
          callback(text, img_name, target_action, img_to_send);
        }
      img.src = dataURL;
    };
    fr.readAsDataURL(img_src);
}


function resize_and_compress(source_img, img_width, img_height, mime_type){
    var new_width;
    switch (true) {
      case img_width < wranges[4]:
         new_width = wranges[4];
         break;
      case img_width < wranges[3]:
         new_width = wranges[4];
         break;
      case img_width < wranges[2]:
         new_width = wranges[3];
         break;
      case img_width < wranges[1]:
         new_width = wranges[2];
         break;
      case img_width < wranges[0]:
         new_width = wranges[1];
         break;
      default:
         new_width = wranges[0];
         break;
    }
    var wpercent = (new_width/img_width);
    var new_height = Math.round(img_height*wpercent);
    var canvas = document.createElement('canvas');//supported
    canvas.width = new_width;
    canvas.height = new_height;
    var ctx = canvas.getContext("2d");
    ctx.drawImage(source_img, 0, 0, new_width, new_height);
    return dataURItoBlob(canvas.toDataURL(mime_type),mime_type);
}

// converting image data uri to a blob object
function dataURItoBlob(dataURI,mime_type) {
  var byteString = atob(dataURI.split(',')[1]);
  var ab = new ArrayBuffer(byteString.length);
  var ia = new Uint8Array(ab);//supported
  for (var i = 0; i < byteString.length; i++) { ia[i] = byteString.charCodeAt(i); }
  return new Blob([ab], { type: mime_type });
}

If warranted, here's the test image I've used:

Here's the image's original location.

Note that for several other images I tried, the code did behave as expected. It doesn't always screw up the results, but now I can't be sure that it'll always work. Let's stick to pure JS solutions for the scope of this question.

The linked Q/A will give you the how to workaround. I'll try to add a more comprehensive answer to this dupe if I've got time, but basically, when you draw your jpeg on the canvas, you also draw the JPEG artifacts — Kaiido, Feb 05 '18 at 23:30
@Kaiido: The only answer attempted in that dupe is asking the person to use the `encoderOptions` parameter to set quality. In my case, I leave it at the default setting anyway. — Hassan Baig, Feb 05 '18 at 23:30
And that's your problem. Your question is still the same, understanding how JPEG lossy compression works + understanding why drawing a lossy JPEG on a canvas produces more info + understanding why bigger quality option will produce bigger JPEG file size. So I think one can answer both questions in a single post => dupes. — Kaiido, Feb 05 '18 at 23:32
@Kaiido: then it comes down to - is it possible to detect an input image's `encoderOptions` setting (before processing it)? That's one way to achieve an apples to apples comparison (and resizing). — Hassan Baig, Feb 05 '18 at 23:34
Not really... Even if there were a way to detect what quality level was used, when drawn to the canvas, you would start again from raw pixel data, except that all the pixel that have been discarded by the JPEG algorithm will have been replaced by artifacts. There is no way to start from the original image the jpeg version was created from. So you would have to create more artifacts from these artifacts. But if you can wait a few hours I will try to provide a more comprehensive answer to the target. — Kaiido, Feb 05 '18 at 23:49
And actually, maybe your question is a better fit for such a comprehensive answer, so I'll rather close the other when you'll have such an answer. — Kaiido, Feb 05 '18 at 23:54

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

Why Canvas is not the best option to shrink an image file size.

I won't go into too much details, nor in depth explanations, but I will try to explain to you the basics of what you encountered.

Here are a few concepts you need to understand (at least partially).

What is a lossy image format (like JPEG)
What happens when you draw an image to a canvas
What happens when you export a canvas image to an image format

Lossy Image Format.

Image formats can be divided in three categories:

raw Image formats
lossless image formats (tiff, png, gif, bmp, webp ...)
lossy image formats (jpeg, ...)

Lossless image formats generally simply compress the data in a table mapping pixel colors to the pixel positions where this color is used.

On the other hand, Lossy image formats will discard information and produce approximation of the data (artifacts) from the raw image in order to create a perceptively similar image rendering, using less data.

Approximation (artifacts) works because the decompression algorithm knows that it will have to spread the color information on a given area, and thus it doesn't have to keep every pixels information.

But once the algorithm has treated the raw image, and produced the new one, there is no way to find back the lost data.

Drawing an image to the canvas.

When you draw an image on a canvas, the browser will convert the image information to a raw image format.
It won't store any information about what image format was passed to it, and in the case of a lossy image, every pixels contained in the artifacts will become a first class citizen as every other pixels.

Exporting a canvas image

The canvas 2D API has three methods to export its raw data:

getImageData. Which will return the raw pixels RGBA values
toDataURL. Which will apply a compression algorithm corresponding to the MIME you passed as argument, synchronously.
toBlob. Similar to toDataURL, but asynchronously.

The case we are interested in is the one of toDataURL and toBlob along with the "image/jpeg" MIME.
Remember that when calling this method, the browser only sees the current raw pixel data it has on the canvas. So it will apply once again the jpeg algorithm, removing some data, and producing new approximations (artifacts) from this raw image.

So, yes, there is an 0-1 quality parameter available for lossy compression in these methods, so one could think that we could try to know what was the original loss level used to generate the original image, but even then, since we actually produced new image data in the drawing-to-canvas step, the algorithm might not be able to produce a good spreading scheme for these artifacts.

An other thing to take into consideration, mostly for toDataURL, is that browsers have to be as fast as possible when doing these operations, and thus they will generally prefer speed over compression quality.

Alright, the canvas is not good for it. What then?

Not so easy for jpeg images... jpegtran claims it can do a lossless scaling of your jpeg images, so I guess it should be possible to make a js port too, but I don't know any...

Special note about lossless formats

Note that your resizing algorithm can also produce bigger png files, here is an example case, but I'll let the reader guess why this happens:

var ctx= c.getContext('2d');
c.width = 501;
for(var i = 0; i<500; i+=10) {
  ctx.moveTo(i+.5, 0);
  ctx.lineTo(i+.5, 150);
}
ctx.stroke();

c.toBlob(b=>console.log('original', b.size));

c2.width = 500;
c2.height = (500 / 501) * c.height;
c2.getContext('2d').drawImage(c, 0, 0, c2.width, c2.height);
c2.toBlob(b=>console.log('resized', b.size));

<canvas id="c"></canvas>
<canvas id="c2"></canvas>

The TL;DR is *"Don't reencode into JPG (lossy compression) something that is already a JPG, because it will add more artefacts"*. I don't really agree with this answer: sometimes **there is no other way** to make a JPG smaller than reencoding. So yes, when you have a 2MB JPG, reexporting it to JPG with quality = 60% seems totally ok to get a 400KB JPG :) — Basj, Jun 27 '18 at 12:20
@Basj I don't think there is such TL;DR in this answer. The TL;DR if there should be one, would rather be, don't do it on a canvas since it will prefer speed over wuality of compression. — Kaiido, Jun 27 '18 at 12:26
Where do you know that well-known browsers (e.g. Chrome of FF) have a bad implementation of the JPEG compression algorithm inside `toDataURL` leading to more artefacts than what a "normal" editor would do? (let's say `toDataURL` JPG compression vs. MS Paint JPG compression) I'm ok to believe this, but it would be cool to see a benchmark / or source code of Chromium showing where the JPEG algorithm is badly implemented. — Basj, Jun 27 '18 at 12:33
@Basj please stop putting words in my mouth I didn't say. I never said they do have bad implementations, nor that it will produce more artefacts than MS Paint, I said they'll trade off the quality of the compression for faster results. Remember that toDataURL is a synchronous method. They can't do a multipass that would stuck the UI 5 seconds. Other softs, like Photoshop, Gimp, and possibly ImageMagic or even MS Paint won't care as much, and will generally offer an option for "better compression, slower". There is no such option with the canvas API. — Kaiido, Jun 27 '18 at 13:52
1/2 I didn't put words in your mouth @Kaiido, I just tried to understand what you mean by *"it will prefer speed over quality of compression"*. Prefering "speed over quality" implicitely means the quality is lower than expected with other software. Once again, I trust you about `Other softs [...] won't care as much, and will generally offer an option for "better compression, slower"`, but I would like to see a benchmark or source that confirms that `var fullQuality = canvas.toDataURL('image/jpeg', 1.0);` is worse than another software's compression. — Basj, Jun 27 '18 at 20:27
2/2 I will happily trust this claim (and then look for another method than `canvas`), but I think that the claim `Alright, the canvas is not good for it.` needs to be substantiated with concrete elements (source code, benchmark, or anything else) to be a valid argument. — Basj, Jun 27 '18 at 20:31

Hassan Baig · Answer 2 · 2018-02-06T13:47:56.180

1

This is a recommendation and not really a fix (or a solution).

If you've run into this problem, make sure you compare the file sizes of the two images once you've completed the resize operation. If the new file is larger, then simply fallback to the source image.

edited Feb 06 '18 at 13:47

answered Feb 06 '18 at 00:20

Hassan Baig

15,055
27
102
205

Here's an interesting (yet unconfirmed) rule of thumb to go by when assessing images for resizing: https://stackoverflow.com/a/26509546/4936905 – Hassan Baig Feb 06 '18 at 00:21