3

My page state can be described by a JavaScript object that can be serialized into JSON. But I don't think a JSON string is suitable for use in a fragment ID due to, for example, the spaces and double-quotes.

Would encoding the JSON string into a base64 string be sensible, or is there a better way? My goal is to allow the user to bookmark the page and then upon returning to that bookmark, have a piece of JavaScript read window.location.hash and change state accordingly.

jl6
  • 6,110
  • 7
  • 35
  • 65

3 Answers3

1

Use encodeURIComponent and decodeURIComponent to serialize data for the fragment (aka hash) part of the URL.

This is safe because the character set output by encodeURIComponent is a subset of the character set allowed in the fragment. Specifically, encodeURIComponent escapes all characters except:

  • A - Z
  • a - z
  • 0 - 9
  • - . _ ~ ! ' ( ) *

So the output includes the above characters, plus escaped characters, which are % followed by hexadecimal digits.

The set of allowed characters in the fragment is:

  • A - Z
  • a - z
  • 0 - 9
  • ? / : @ - . _ ~ ! $ & ' ( ) * + , ; =
  • percent-encoded characters (a % followed by hexadecimal digits)

This set of allowed characters includes all the characters output by encodeURIComponent, plus a few other characters.

jameshfisher
  • 34,029
  • 31
  • 121
  • 167
0

I think you are on a good way. Let's write down the requirements:

  1. The encoded string must be usable as hash, i.e. only letters and numbers.
  2. The original value must be possible to restore, i.e. hashing (md5, sha1) is not an option.
  3. It shouldn't be too long, to remain usable.
  4. There should be an implementation in JavaScript, so it can be generated in the browser.

Base64 would be a great solution for that. Only problem: base64 also contains characters like - and +, so you win nothing compared to simply attaching a JSON string (which also would have to be URL encoded).

BUT: Luckily, theres a variant of base64 called base64url which is exactly what you need. It is specifically designed for the type of problem you're describing.

However, I was not able to find a JS implementation; maybe you have to write one youself – or do a bit more research than my half-assed 15 seconds scanning the first 5 Google results.

EDIT: On a second thought, I think you don't need to write an own implementation. Use a normal implementation, and simply replace the “forbidden” characters with something you find appropriate for your URLs.

lxg
  • 12,375
  • 12
  • 51
  • 73
0

Base64 is an excellent way to store binary data in text. It uses just 33% more characters/bytes than the original data and mostly uses 0-9, a-z, and A-Z. It also has three other characters that would need encoded to be stored in the URL, which are /, =, and +. If you simply used URL encoding, it would take up 300% (3x) the size.

If you're only storing the characters in the fragment of the URL, base64-encoded text it doesn't need to be re-encoded and will not change. But if you want to send the data as part of the actual URL to visit, then it matters.

As referenced by lxg, there there is a base64url variant for that. This is a modified version of base64 to replace unsafe characters to store in the URL. Here is how to encode it:

function tobase64url(s) {
    return btoa(x).replace(/\+/g,'-').replace(/\//g,'_').replace(/=/g,'');
}
console.log(tobase64url('\x00\xff\xff\xf1\xf1\xf1\xff\xff\xfe'));
// Returns "AP__8fHx___-" instead of "AP//8fHx///+"

And to decode a base64 string from the URL:

function frombase64url(s) {
    return atob(x.replace(/-/g,'+').replace(/_/g, '/'));
}
Pluto
  • 2,900
  • 27
  • 38
  • May I ask why you strip the `=` padding rather than replace it? Do you know that the `atob` function is able to correctly decode the string even without the padding characters? – jl6 Sep 14 '14 at 20:50
  • @jl6 Padding in base64 just represents the absence of characters, because every 3 characters are encoded in 4 characters of base64. In Firefox the `btoa` function can decode it without it, though I didn't check other browsers. But there's only one line in a URL so the padding will only occur at the end and any browser should be able to figure it out. If you want to make sure it's added back in before decoding it, then you'd add on `4*Math.ceil(s.length/4)-s.length` = padding characters to the end. Anyways the reason for leaving it out is because the equal sign is used in query strings in URLs. – Pluto Sep 15 '14 at 20:22