5

Is there a default charset for data URIs? I read the spec but I don't see one.

For instance, if I have a data URI for a source map which I expect to be reliably interpreted across browsers, is it OK to omit the charset?

//@ sourceMappingURL=data:application/json;base64,eyJ2ZXJza...

vs

//@ sourceMappingURL=data:application/json;charset=utf-8;base64,eyJ2ZXJza...

I see in this GitHub issue that people have had problems using Chinese characters in source-mapped files without an explicit charset=utf-8. So if there is a default (or, at least, if we could expect browsers to have chosen one), it doesn't seem like utf-8 is the one...

Community
  • 1
  • 1
Jackson
  • 9,188
  • 6
  • 52
  • 77
  • This doesn't relate to the data URI itself but the interpretation of the data. There is a standard for JSON, which is UTF-8 only, but that is a relatively recent change from the standard that it must be one of several Unicode encodings that are easily distinguished in the context of valid JSON. So, charset on an application/json MIME type is unnecessary if you assume compliance with past or present JSON standards. If the JSON is non-standard, I suggest converting it as close to the source as possible (and filing a bug report if applicable). – Tom Blodget Jul 13 '19 at 15:00

1 Answers1

1

According to RFC 2397 § 2, a data URI without a specified charset defaults to US-ASCII. Because every Base64-encoded URI uses only ASCII characters. Moreover, “all US‑ASCII strings become valid UTF‑8” which means there’s “decent backwards compatibility in many cases”.1

Nevertheless, UTF-8 implementation remains uneven in 2019. Because of that – and because there is little cost to explicitly calling the charset to already-user-unfriendly data URIs – it’s probably not a bad idea to include charset=utf-8 in your sourcemap URIs in order to keep them reliably interpreted across browsers.


  1. Arjun Chaudhary’s answer to Is there a drastic difference between UTF-8 and UTF-16.
Community
  • 1
  • 1
Lucas
  • 523
  • 2
  • 10
  • 20