28

I need to generate a GUID and save it via a string representation. The string representation should be as short as possible as it will be used as part of an already-long URL string.

Right now, instead of using the normal abcd-efgh-... representation, I use the raw bytes generated and base64-encode them instead, which results in a somewhat shorter string.

But is it possible to make it even shorter?

I'm OK with losing some degree of uniqueness and keeping a counter, but scanning all existing keys is not an option. Suggestions?

chakrit
  • 61,017
  • 25
  • 133
  • 162

7 Answers7

16

I used an Ascii85 encoding for writing a Guid to a database column in 20 ASCII characters. I've posted the C# code in case it is useful. The specific character set may be different for a URL encoding, but you can pick whichever characters suit your application. It's available here: What is the most efficient way to encode an arbitrary GUID into readable ASCII (33-127)?

Community
  • 1
  • 1
sheikhjabootie
  • 7,308
  • 2
  • 35
  • 41
  • Not every char in ASCII85 encoding is safe to use in url and would require encoding to be safe&short. This is not the optimal solution. – Imre Pühvel Apr 22 '21 at 11:39
8

Sure, just use a base larger than 64. You'll have to encode them using a custom alphabet, but you should be able to find a few more "url-safe" printable ASCII characters.

Base64 encodes 6 bits using 8, so a 16 byte GUID value becomes 22 bytes encoded. You may be able to reduce that by a character or two, but not much more.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
3

I found this discussion interesting: https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

Basically you take the 36 characters and turn them into 16 bytes of binary but first sort the three temporal pieces using a stored procedure:

set @uuid:= uuid();
select @uuid;
+--------------------------------------+
| @uuid                                |
+--------------------------------------+
| 59f3ac1e-06fe-11e6-ac3c-9b18a7fcf9ed |
+--------------------------------------+

CREATE DEFINER=`root`@`localhost`
    FUNCTION `ordered_uuid`(uuid BINARY(36))
    RETURNS binary(16) DETERMINISTIC
    RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25)));

select hex(ordered_uuid(@uuid));
+----------------------------------+
| hex(ordered_uuid(@uuid))         |
+----------------------------------+
| 11e606fe59f3ac1eac3c9b18a7fcf9ed |
+----------------------------------+
Andrei Sura
  • 2,465
  • 1
  • 20
  • 15
2

I'm not sure if this is feasible, but you could put all the generated GUIDs in a table and use in the URL only the index of the GUID in the table.

You could also reduce the length of the guid - for example use 2 bytes to indicate the number of days since 2010 for example and 4 bytes for the number of miliseconds since the start of the current day. You will have collisions only for 2 GUIDs generated in the same milisecond. You could also add 2 more random bytes which will make this even better.

Meh
  • 7,016
  • 10
  • 53
  • 76
  • 4
    Using an index defeats one of the purposes of putting a GUID in a URL, though -- security. There's a lot of data in a GUID, so someone can't just increment the number by one and try it out... but they could do that with an index. – Rob Whelan May 25 '13 at 15:40
2

(long time, but just came into the same need today)

UUIDs are 128bits long, represented by 32 hex plus 4 hyphens. If we use a dictionary of 64 (2^6) printable ascii`s, it is just a matter of converting from 32 groups of 4 bits (length of a hex) to 22 groups of 6 bits.

Here is a UUID shortner. Instead 36 chars you get 22, without losing the original bits.

https://gist.github.com/tomlobato/e932818fa7eb989e645f2e64645cf7a5

class UUIDShortner
    IGNORE = '-'
    BASE6_SLAB = ' ' * 22

    # 64 (6 bits) items dictionary
    DICT = 'a'.upto('z').to_a +
        'A'.upto('Z').to_a +
        '0'.upto('9').to_a +
        ['_', '-'] 

    def self.uuid_to_base6 uuid
        uuid_bits = 0

        uuid.each_char do |c|
            next if c == IGNORE
            uuid_bits = (uuid_bits << 4) | c.hex
        end

        base6 = BASE6_SLAB.dup

        base6.size.times { |i|
            base6[i] = DICT[uuid_bits & 0b111111]
            uuid_bits >>= 6
        }

        base6
    end
end

# Examples:

require 'securerandom'
uuid = ARGV[0] || SecureRandom.uuid
short = UUIDShortner.uuid_to_base6 uuid
puts "#{uuid}\n#{short}"

# ruby uuid_to_base6.rb
# c7e6a9e5-1fc6-4d5a-b889-4734e42b9ecc
# m75kKtZrjIRwnz8hLNQ5hd
Tom Lobato
  • 136
  • 2
  • 4
1

You could approach this from the other direction. Produce the shortest possible string representation and map it into a Guid.

Generate the key using a defined alphabet as below:

In psuedocode:

string RandomString(char[] alphabet, int length)
{
  StringBuilder result = new StringBuilder();
  for (int i = 0; i < length; i++)
    result.Append(alphabet[RandomInt(0, alphabet.Length)]);

  return result;
}

If you keep the string length < 16, you can simply hex encode the result and pass it to the Guid constructor to parse.

Rob
  • 5,525
  • 1
  • 24
  • 26
  • 2
    I know that this is an old topic, but I thought I should point out that building a string and putting it into a GUID is a very bad idea. If you were to take a pseudo-random string like this and use if for various purposes, then that's fine. However, passing it off as a GUID will probably cause problems. – Devin Goble Mar 15 '12 at 19:18
0

not for exact same problem, but very very close - I have used CRC64, Base64 that and you get 11 bytes, CRC64 has been tested (not proven) to NOT produce duplicates on a wide range of strings.

And since it is 64 bit long by definition - you get the key that is half the size.

To directly answer the original question - you can CRC64 encode any representation of your GUIDs.

Or just run CRC64 on the business key and you will have a 64 bit unique thing that you can then base64.

TJunkie
  • 73
  • 1
  • 8
  • 3
    The problem with the CRC64 is that it's not reversible. You cannot produce UUID back from the CRC64 like you can do wit Base64. – Marko Aug 25 '12 at 23:18
  • @Marko it is useful if you also store the resulting CRC64. but the utility of that is probably debatable. – chakrit Nov 25 '15 at 02:05