JavaScript Canvas toDataURL security considerations

Question

I'm curious if anyone could fill me in on how safe it is to use toDataURL on a user provided image. The basic idea is that User A would upload an image in their browser and it would be converted to an URL, and then ignoring steps in between, eventually User B (as well as other users) would retrieve the URL format where it would be converted back into an image and displayed in User B's browser.

So my question revolves around whether someone could abuse the system to inject code into User B's browser, or otherwise cause havoc. In general, what security considerations are there that must be taken when using toDataURL and then later converting it back?

I'm aware that cross origin images taint a canvas which disallows any methods that involve the data, but I'm not aware of how much of a blanket solution this is. I've read that some browsers don't have this restriction while other's (and even other versions of the same browser) implement this restriction differently depending on the content of the cross origin image.

What I've found in my research so far:

this question where the answer pointed to a great article that looked at it from the perspective of storing the uploaded image on a server.
this question where the answer points out an interesting way to hide a script in an image I'd never seen before, but I'm not sure what vulnerability it creates if I'm not trying to extract a script from that image and run it.
and this link which details a great reason why browser's choose to restrict access to image data for cross origin images. I always assumed it was just about protecting against malicious images, but now realize it also protects against much more.

None of the above have sufficiently approached it from the perspective of one user attacking another user through uploading an image (that doesn't stay as uploaded but instead gets converted to data url) that another user later downloads and views (with img src set to data url, not the malicious user's original upload). 2 is close to answering my question, but as I understand it, the detailed methods wouldn't work without the malicious user also having injected some script into the viewing user's browser.

To go along with this question is an example of what I would like to do including the file uploading/conversion to data url along with a sample data url to try out the importing (this sample url is safe to import and small so it imports quickly):

window.onload = function() {
    document.getElementById("convert").onclick = convert;
    document.getElementById("import").onclick = importF;

    let imageLoader = document.getElementById("imageLoader");
    let canvas = document.getElementById("imageCanvas");
    let ctx = canvas.getContext("2d");

    imageLoader.addEventListener('change', e => {
      let reader = new FileReader();

      reader.onload = (ee) => {
          loadImage("imageCanvas", ee.target.result);
      }

      reader.readAsDataURL(e.target.files[0]);  
    }, false);
};

function loadImage(id, src) {
  let canvas = document.getElementById(id);
  let ctx = canvas.getContext("2d");
  let img = new Image();
  
  img.onload = () => {
      canvas.width = img.width;
      canvas.height = img.height;
      ctx.drawImage(img, 0, 0);
  }
  
  img.src = src;
}

function convert() {
  let canvas = document.getElementById("imageCanvas");
  console.log(canvas.toDataURL());
}

function importF() {
  let imageImport = document.getElementById("imageImport");
  let url = imageImport.value;
  loadImage("imageCanvas", url);
}

<label>Upload Image:</label>
<input type="file" id="imageLoader" name="imageLoader"/>
<br/>

<label>Import Image:</label>
<input type="text" id="imageImport" name="imageImport"/>
<br/>

<label>Sample URL:</label>
<code style="user-select: all;"> data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAOCAYAAAAmL5yKAAAApUlEQVQ4T2NkQALyKu5GjP8Z01kZ/1n/+s+kjSyHi80Ik1BQdvf4z8C4nYPx/z819t9M+py/GRj+MzAwgFTgocEGyCl75DEyMEz04f3OEC34lUGY+R8xloPVMMopeTgzMjLsMeb8xdAu8YFojTCFjPJKblNlWf+lTpV5z8rBCHIraYBRQ9Xtoi3XL70S0U+k6YSqZpRX9vgfK/CVIVbw66gBIzcMAHB4Ryt6jeYXAAAAAElFTkSuQmCC </code>
<br/>

<button id="import"> Import from URL </button>
<button id="convert"> Convert to URL </button>
<br/>

<canvas id="imageCanvas"></canvas>

I'd say the browser is broken and needs to be fixed should this really be a problem, *as long* as you use the data url in an image tag, only. Because otherwise obviously I could be crafting images to create malicious websites. For SVGs (which may contains scripts) as data urls in an image tag, they will be executed in a "sandbox". — Sebastian, Mar 16 '20 at 09:04
Good to know about the sandbox, makes me feel more secure. I would agree that any issue is probably more on the browser than on me, but it helps a lot to hear others feel the same. — Matthew Ludwig, Mar 16 '20 at 09:08
Make sure it's a valid URL that gets uploaded before you inline it into your page, of course! Otherwise unescaped "URLs" like this will be a problem: `data:image/png;base64,iVBORw"> — Sebastian, Mar 16 '20 at 09:12
Yup no worries there, as you can see from the snippet provided in my question I have no plans on inlining it into my page. Thanks for all the useful information :) — Matthew Ludwig, Mar 16 '20 at 09:16
@Sebastian svg in an img won't execute any script, not even sandboxed. — Kaiido, Mar 16 '20 at 23:15

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

There seems to be some confusion here, and given how misleading your links are I can understand.

Tainted canvas

"Tainting the canvas" is a security operation which blocks .toDataURL() and any other exporting method like .toBlob(), .captureStream() or 2D context's .getImageData().
There are only a few cases where this operation is done:

Cross-origin resources: That's the most common on the web. Site A drew a resource like an image from Site B on a canvas. If Site B didn't tell the browser that it allows Site A to read this content by passing an appropriate Allow-Origin headers, then the browser has to "taint" the canvas.
This only protects the resource. There is no real security added to Site A in that case.
Information leakage: That's more of an exception, but still it's a thing. Browsers may decide on their own that some actions could leak privacy information about their user. For instance the most common case is to "taint" the canvas when an SVG image containing a <foreigObject> is painted on the canvas. Since this tag can render HTML, it can also leak what link has been visited for instance. Browsers should take care of anonymizing these resources, but nevertheless, Safari still does taint any such SVG image, Chrome buggily still does taint the ones served from a blob: URI, IE did taint any SVG image (not only the ones with <foreignObject>), and all did at some point taint the canvas when using some externals filter.
Information leakage II: There is also something that no browser can fight against when reading a canvas generated bitmap. Every hardware and software will produce slightly different results when asked to perform the same drawing operations. This can be used to finger-print the current browser. Some browser extensions will thus block these methods too, or make it return dummy results.

Now, none of this really protects from malicious images.

Images embedding malicious code

The kind of images that can embed malicious code are generally exploiting vulnerabilities in the image parsers and renderers. I don't think any up to date such parser or renderer is still vulnerable to such attacks, but even though there was one, which would be used by a web browser, then when it's been drawn to the canvas, it's already too late. Tainting the canvas would not protect anything.

One thing you may have heard about is stegosploit. This consists in hiding malicious code in the image, but the HTML canvas there was used to decode that malicious code. So if you don't have the script to extract and execute the malicious script embedded, it doesn't represent much a risk, and actually, if you do reexport it, there are good chances that this embedded data gets lost.

Risks inherent with uploading content to a server

There are many risks when uploading anything to your server. I can't stress it out enough but Read OWASP recommendations carefully.

Particular risks when uploading a `data:` URL

data: URLs are a good vector for XSS attacks. Indeed, it is very likely that you will build HTML code directly using that data: URL. If you didn't apply the correct sanitization steps, you may very well load an attacker's script instead of an image:

const dataURIFromServer = `data:image/png,"' onerror="alert('nasty script ran')"`;

const makeImgHTML = ( uri ) => `<img src="${uri}">`;

document.getElementById('container').innerHTML = makeImgHTML(dataURIFromServer);

<div id="container"></div>

A final word on `data:` URLs

data: URLs are a mean to store data in an URL so that it can be passed directly without the need for a server.
Storing a data: URL to a server is counter-productive.
To represent binary data, this data needs to be encoded in base64 so that all unsafe characters can still be represented in most encodings. This operation will cause a grow of about 34% of the original data size, and you will have to store this as a String, which is not convenient for most data bases.

Really, data: URLs are from an other era. There is really little cases where you want to use it. Most of what you want to do with a data: URL, you should do it with a Blob and a blob: URL. For instance, upload your image as Blob directly to your server. Use the canvas .toBlob() method if you need to export its content. Use img.src = URL.createObjectURL(file) if you want to present an image picked by your user.

TL;DR

- In your scenario toDataURL() in itself will not create any risk, nor will it prevent any.
- Use the well-known techniques to sanitize your users' uploads (don't trust them ever and remember they may not even be using your UI to talk to your server).
- Avoid data: URLs. They are inefficient.

Thanks for all the info, especially that OWASP link it was a very interesting read and the info on blob over data. I'll look into using the blob, but it's very likely that I'll be passing directly from user to user without a server so I'll have to see which of the two work best for my needs. Thanks again! — Matthew Ludwig, Mar 17 '20 at 14:09
Data URLS are not necessarily inefficient: Base64 encoded they compress quite well and you save a complete HTTP request with request and response headers. For small images they are actually more efficient than doing that extra request and for SVGs in data urls they can be encoded almost with the same size and you still get the sandboxing effect: https://codepen.io/tigt/post/optimizing-svgs-in-data-uris — Sebastian, Mar 18 '20 at 09:39
@Sebastian what HTTP request are you talking about? We can only compare dataURLs to blob URLs. There is no HTTP request in any of these. And yes, they are very inefficient. You have to fetch 30% the size of the file if fetched from a server compressed or not because your binary image is also compressed anyway, then you have to store that big string in a new DOMString everytime it's set as the `src` of an Element (DOMStrings are encoded in UTF-16, so this doubles the actual size). — Kaiido, Mar 18 '20 at 09:54
@Kaiido - this is about two users, two browsers, and an image being uploaded from one session and downloaded in another. You can't use blob URLs here, because they only work within one JavaScript session. So there is either data urls or "real" URLs. An inlined (gzip-compressed fetched) data url (base64 encoded) can be smaller than a second request for a binary resource. data URLs are not inefficient per se. Only in some cases, one of them being local sessions. — Sebastian, Mar 18 '20 at 15:08
@Sebastian So you are talking about OP'case? Then to upload that data to an other user, no matter how it is done, you will need two HTTP requests, and uploading + downloading a binary file will always be more efficient than uploading and downloading its base64 representation. For instance I just tested and a 2000 x 2000px png image is ~21MB trough the wires as base64 vs ~15MB for the binary. But that's not all, now that you fetched that big string, you still need to store it in UTF-16 in the DOM. 40MB. And now the browser has to decode it back to binary to read it. => 55MB vs 15MB in memory. — Kaiido, Mar 18 '20 at 23:21