I was watching a tutorial regarding system design for tiny url, and reading up on base62 encoding to avoid collision. They say to use a counter, and encode it with base62. Now this makes sense but looking at some online base62encoder, if the tiny url limit character say only 7 characters, if some of the encoder generate more than 7 characters.
Are there multiple type of base62 encoding? e.g this two websites, gives 2 different result for same input of 1000000

- 4,023
- 9
- 54
- 93
-
1Could you explain how `base62 encoding` avoids collision? – Vlad Feinstein Nov 03 '21 at 22:30
-
1basically they mentioned to use a counter and then encode that counter.. so there won't be collision, which makes sense.. but when I search online for base62 encoder, some encoding give 10+ character result. so if the tinyurl is only 8 characters, that means we can only take the 1st 8 characters, which will end up causing collision again. but some base62 encoding only generate like 4 or 5 characters, that's why I'm trying to understand is base62 encoding implementation basically up to the developer? – Harts Nov 03 '21 at 22:35
-
1You have a binary code, you convert to a number, you convert the number into a (e.g.) 62 base, then you choose 62 symbols to display the digits of such 62 number. So the length depends just on initial string (base62 is not an hashing algorithm). Greatest common denominator (with e.g. 8 if you are using a byte encoding) will help you to split the data in manageable numbers (but so you see 62 is not a good choice, but on selected medium with limited symbols). Like base64: different protocols may use different symbols/implementation details. – Giacomo Catenazzi Nov 04 '21 at 10:00
-
@GiacomoCatenazzi since it's not a hashing algorithm that means the input can be reversed, which in turn means that one could guess additional urls the shortener service has created by simply requesting one to find what the latest counter the server used? – Pasha Skender Mar 08 '22 at 22:27
-
Probably better to think of base62 as just another number system, like base16. 0, ...9, A, ...Z, a, ...z, 10, ...19, 1A, ...1Z, 1a, ...1z. Encoding arbitrary binary strings into it is kinda annoying because there's no nice alignment, i.e. 1 number = 6 bits, but if you treat it like a number and divide by 62, use the remainder to pick a char, repeat until dividend is 0, you'll get a result. – Nick T Jul 29 '22 at 01:14
2 Answers
Base62 and Base64 encodings are used to represent binary data as text.
I am not sure what practical use base62
has. base64
, on another hand, can represent 6 bits as one character, Your sample value 1,000,000
(hex 0xF4240
) uses 20 bits, so it fits into 4 base64
characters.
Your first example uses a plain text 1000000
, which is 7 characters, 8-bit each. Or total of 56 characters, that would require 10 base64
characters.
You will get similar numbers for base62
, but the encoding must be non-trivial, as you can't simply chop your data into 6-bits pieces.
Wiki link above mentions multiple variants, so you do have to agree between encoder and decoder - which one to use. But this is NOT the issue you saw in your two examples.

- 10,960
- 1
- 12
- 27
-
3Practical use of Base62: https://web.archive.org/web/20211006080405/https://github.blog/2021-04-05-behind-githubs-new-authentication-token-formats/ TL;DR double-click selectable & url-safe – coolaj86 Dec 16 '21 at 19:02
Yes, there are multiple algorithms for base62. You need to use the same algorithm implementation to decode what you already encoded, or else it won't decode properly.
The two algorithms for Base62 used are:
(1) Bigint Based Algorithm which is a bit slow with O(n^2) time complexity. e.g. https://github.com/jxskiss/base62/issues/2
(2) Variadic Length Encoding which is much faster O(n). Example implemenations in Go (https://github.com/jxskiss/base62) and in Java (https://github.com/glowfall/base62)
If you use one of the above, you have to keep using it to handle decoding successfully. Or else incorrect results occur.

- 8,198
- 6
- 64
- 63