94

I have been experimenting with using UUIDs as database keys. I want to take up the least amount of bytes as possible, while still keeping the UUID representation human readable.

I think that I have gotten it down to 22 bytes using base64 and removing some trailing "==" that seem to be unnecessary to store for my purposes. Are there any flaws with this approach?

Basically my test code does a bunch of conversions to get the UUID down to a 22 byte String, then converts it back into a UUID.

import java.io.IOException;
import java.util.UUID;

public class UUIDTest {

    public static void main(String[] args){
        UUID uuid = UUID.randomUUID();
        System.out.println("UUID String: " + uuid.toString());
        System.out.println("Number of Bytes: " + uuid.toString().getBytes().length);
        System.out.println();

        byte[] uuidArr = asByteArray(uuid);
        System.out.print("UUID Byte Array: ");
        for(byte b: uuidArr){
            System.out.print(b +" ");
        }
        System.out.println();
        System.out.println("Number of Bytes: " + uuidArr.length);
        System.out.println();


        try {
            // Convert a byte array to base64 string
            String s = new sun.misc.BASE64Encoder().encode(uuidArr);
            System.out.println("UUID Base64 String: " +s);
            System.out.println("Number of Bytes: " + s.getBytes().length);
            System.out.println();


            String trimmed = s.split("=")[0];
            System.out.println("UUID Base64 String Trimmed: " +trimmed);
            System.out.println("Number of Bytes: " + trimmed.getBytes().length);
            System.out.println();

            // Convert base64 string to a byte array
            byte[] backArr = new sun.misc.BASE64Decoder().decodeBuffer(trimmed);
            System.out.print("Back to UUID Byte Array: ");
            for(byte b: backArr){
                System.out.print(b +" ");
            }
            System.out.println();
            System.out.println("Number of Bytes: " + backArr.length);

            byte[] fixedArr = new byte[16];
            for(int i= 0; i<16; i++){
                fixedArr[i] = backArr[i];
            }
            System.out.println();
            System.out.print("Fixed UUID Byte Array: ");
            for(byte b: fixedArr){
                System.out.print(b +" ");
            }
            System.out.println();
            System.out.println("Number of Bytes: " + fixedArr.length);

            System.out.println();
            UUID newUUID = toUUID(fixedArr);
            System.out.println("UUID String: " + newUUID.toString());
            System.out.println("Number of Bytes: " + newUUID.toString().getBytes().length);
            System.out.println();

            System.out.println("Equal to Start UUID? "+newUUID.equals(uuid));
            if(!newUUID.equals(uuid)){
                System.exit(0);
            }


        } catch (IOException e) {
        }

    }


    public static byte[] asByteArray(UUID uuid) {

        long msb = uuid.getMostSignificantBits();
        long lsb = uuid.getLeastSignificantBits();
        byte[] buffer = new byte[16];

        for (int i = 0; i < 8; i++) {
            buffer[i] = (byte) (msb >>> 8 * (7 - i));
        }
        for (int i = 8; i < 16; i++) {
            buffer[i] = (byte) (lsb >>> 8 * (7 - i));
        }

        return buffer;

    }

    public static UUID toUUID(byte[] byteArray) {

        long msb = 0;
        long lsb = 0;
        for (int i = 0; i < 8; i++)
            msb = (msb << 8) | (byteArray[i] & 0xff);
        for (int i = 8; i < 16; i++)
            lsb = (lsb << 8) | (byteArray[i] & 0xff);
        UUID result = new UUID(msb, lsb);

        return result;
    }

}

output:

UUID String: cdaed56d-8712-414d-b346-01905d0026fe
Number of Bytes: 36

UUID Byte Array: -51 -82 -43 109 -121 18 65 77 -77 70 1 -112 93 0 38 -2 
Number of Bytes: 16

UUID Base64 String: za7VbYcSQU2zRgGQXQAm/g==
Number of Bytes: 24

UUID Base64 String Trimmed: za7VbYcSQU2zRgGQXQAm/g
Number of Bytes: 22

Back to UUID Byte Array: -51 -82 -43 109 -121 18 65 77 -77 70 1 -112 93 0 38 -2 0 38 
Number of Bytes: 18

Fixed UUID Byte Array: -51 -82 -43 109 -121 18 65 77 -77 70 1 -112 93 0 38 -2 
Number of Bytes: 16

UUID String: cdaed56d-8712-414d-b346-01905d0026fe
Number of Bytes: 36

Equal to Start UUID? true
mainstringargs
  • 13,563
  • 35
  • 109
  • 174
  • One way to look at it is that a UUID is 128 random bits, so 6 bits per base64-item, is 128/6=21.3, so you're right that you need 22 base64 positions to store the same data. – Stijn Sanders Apr 21 '09 at 14:22
  • 1
    You previous question seems essentially the same: http://stackoverflow.com/questions/772325/what-is-the-smallest-way-to-store-a-uuid-that-is-human-readable – erickson Apr 21 '09 at 14:54
  • 1
    I'm not sure your code is correct in the second for loop of asByteBuffer you subtract i from 7 but i iterates from 8 to 16 which means it will shift by a negative number. IIRC <<< wraps around but it still doesn't seem correct. – Jon Tirsen Jan 31 '12 at 12:15
  • 1
    I think it's easier to just use ByteBuffer to convert the two longs to a byte array like in this question: http://stackoverflow.com/questions/6881659/how-to-convert-two-longs-to-a-byte-array-how-to-convert-uuid-to-byte-array – Jon Tirsen Jan 31 '12 at 12:16
  • What is the point of "human readable"? Take a look at what the uuid_short() mysql/mariadb function does. – theking2 Sep 26 '22 at 20:36
  • URLs and sharing them – mainstringargs Sep 26 '22 at 21:06

11 Answers11

69

I was also trying to do something similar. I am working with a Java application which uses UUIDs of the form 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8 (which are generated with the standard UUID lib in Java). In my case I needed to be able to get this UUID down to 30 characters or less. I used Base64 and these are my convenience functions. Hopefully they will be helpful for someone as the solution was not obvious to me right away.

Usage:

String uuid_str = "6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8";
String uuid_as_64 = uuidToBase64(uuid_str);
System.out.println("as base64: "+uuid_as_64);
System.out.println("as uuid: "+uuidFromBase64(uuid_as_64));

Output:

as base64: b8tRS7h4TJ2Vt43Dp85v2A
as uuid  : 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8

Functions:

import org.apache.commons.codec.binary.Base64;

private static String uuidToBase64(String str) {
    Base64 base64 = new Base64();
    UUID uuid = UUID.fromString(str);
    ByteBuffer bb = ByteBuffer.wrap(new byte[16]);
    bb.putLong(uuid.getMostSignificantBits());
    bb.putLong(uuid.getLeastSignificantBits());
    return base64.encodeBase64URLSafeString(bb.array());
}
private static String uuidFromBase64(String str) {
    Base64 base64 = new Base64(); 
    byte[] bytes = base64.decodeBase64(str);
    ByteBuffer bb = ByteBuffer.wrap(bytes);
    UUID uuid = new UUID(bb.getLong(), bb.getLong());
    return uuid.toString();
}
Stu Thompson
  • 38,370
  • 19
  • 110
  • 156
swill
  • 2,047
  • 1
  • 16
  • 11
  • 1
    Sorry I had not noticed this comment. Yes I am using Apache commons-codec. `import org.apache.commons.codec.binary.Base64;` – swill Mar 24 '15 at 19:14
  • 1
    A 39% reduction in size. Nice. – Stu Thompson Oct 14 '16 at 21:59
  • 10
    You can use built in since Java 8. `Base64.getUrlEncoder().encodeToString(bb.array())` and `Base64.getUrlDecoder().decode(id)` – Wpigott Jan 16 '19 at 14:26
  • 1
    You may choose not to instantiate the Base64 class, the methods encodeBase64URLSafeString(b[]) and decodeBase64(str) are static, aren't they? – Kumar Mani Jan 16 '20 at 07:14
35

You can safely drop the padding "==" in this application. If you were to decode the base-64 text back to bytes, some libraries would expect it to be there, but since you are just using the resulting string as a key, it's not a problem.

I'd use Base-64 because its encoding characters can be URL-safe, and it looks less like gibberish. But there's also Base-85. It uses more symbols and codes 4 bytes as 5 characters, so you could get your text down to 20 characters.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • 20
    BAse85 only saves 2 characters. Plus, Base85 is not safe to use in URLs, and one major use of UUIDs is entity identifiers in databases, which then end up in URLS. – Dennis Jan 02 '13 at 01:41
  • @erickson can you please share some code snippet to convert to Base85. I tried but couldn't get any reliable Base85 java library – Manish Aug 28 '20 at 04:43
  • @Manish There are several variants of base-85, but each takes more than a “snippet” of code to implement; that kind of answer really doesn’t fit on this site. What kind of problems did your find in the libraries you have tried? I really would recommend base-64, as it has support in core Java and only costs about 7% more space for encoded values. – erickson Sep 02 '20 at 12:12
  • @erickson but base64 doesn't solve my purpose to reduce the uuid to 20 character length. – Manish Sep 02 '20 at 16:36
  • @Manish I see. Do your requirements forbid any special characters like quotes, percent sign (`%`) or backslash (`\\`)? Do you have to encode and decode the identifier? (That is, do you want to be able to convert back to a conventional UUID, or just shorten them?) – erickson Sep 02 '20 at 16:47
13

Here's my code, it uses org.apache.commons.codec.binary.Base64 to produce url-safe unique strings that are 22 characters in length (and that have the same uniqueness as UUID).

private static Base64 BASE64 = new Base64(true);
public static String generateKey(){
    UUID uuid = UUID.randomUUID();
    byte[] uuidArray = KeyGenerator.toByteArray(uuid);
    byte[] encodedArray = BASE64.encode(uuidArray);
    String returnValue = new String(encodedArray);
    returnValue = StringUtils.removeEnd(returnValue, "\r\n");
    return returnValue;
}
public static UUID convertKey(String key){
    UUID returnValue = null;
    if(StringUtils.isNotBlank(key)){
        // Convert base64 string to a byte array
        byte[] decodedArray = BASE64.decode(key);
        returnValue = KeyGenerator.fromByteArray(decodedArray);
    }
    return returnValue;
}
private static byte[] toByteArray(UUID uuid) {
    byte[] byteArray = new byte[(Long.SIZE / Byte.SIZE) * 2];
    ByteBuffer buffer = ByteBuffer.wrap(byteArray);
    LongBuffer longBuffer = buffer.asLongBuffer();
    longBuffer.put(new long[] { uuid.getMostSignificantBits(), uuid.getLeastSignificantBits() });
    return byteArray;
}
private static UUID fromByteArray(byte[] bytes) {
    ByteBuffer buffer = ByteBuffer.wrap(bytes);
    LongBuffer longBuffer = buffer.asLongBuffer();
    return new UUID(longBuffer.get(0), longBuffer.get(1));
}
stikkos
  • 1,916
  • 2
  • 19
  • 34
  • Why do you say, that this code produces url safe uuid? As I understand url safe uuid mustn't contain "+" and "/". But in your code I don't see, that these symbols are replaced. Could you explain? – Pavel_K Jun 14 '21 at 07:33
  • comons-codec Base64 class has a urlSafe constructor parameter which I set to true (if true this encoder will emit - and _ instead of the usual + and / characters). (https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#Base64-boolean-) – stikkos Jun 18 '21 at 22:26
  • Thank you very much for your explanation. – Pavel_K Jun 19 '21 at 06:00
8

I have an application where I'm doing almost exactly this. 22 char encoded UUID. It works fine. However, the main reason I'm doing it this way is that the IDs are exposed in the web app's URIs, and 36 characters is really quite big for something that appears in a URI. 22 characters is still kinda long, but we make do.

Here's the Ruby code for this:

  # Make an array of 64 URL-safe characters
  CHARS64 = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a + ["-", "_"]
  # Return a 22 byte URL-safe string, encoded six bits at a time using 64 characters
  def to_s22
    integer = self.to_i # UUID as a raw integer
    rval = ""
    22.times do
      c = (integer & 0x3F)
      rval += CHARS64[c]
      integer = integer >> 6
    end
    return rval.reverse
  end

It's not exactly the same as base64 encoding because base64 uses characters that would have to be escaped if they appeared in a URI path component. The Java implementation is likely to be quite different since you're more likely to have an array of raw bytes instead of a really big integer.

Bob Aman
  • 32,839
  • 9
  • 71
  • 95
5

Here is an example with java.util.Base64 introduced in JDK8:

import java.nio.ByteBuffer;
import java.util.Base64;
import java.util.Base64.Encoder;
import java.util.UUID;

public class Uuid64 {

  private static final Encoder BASE64_URL_ENCODER = Base64.getUrlEncoder().withoutPadding();

  public static void main(String[] args) {
    // String uuidStr = UUID.randomUUID().toString();
    String uuidStr = "eb55c9cc-1fc1-43da-9adb-d9c66bb259ad";
    String uuid64 = uuidHexToUuid64(uuidStr);
    System.out.println(uuid64); //=> 61XJzB_BQ9qa29nGa7JZrQ
    System.out.println(uuid64.length()); //=> 22
    String uuidHex = uuid64ToUuidHex(uuid64);
    System.out.println(uuidHex); //=> eb55c9cc-1fc1-43da-9adb-d9c66bb259ad
  }

  public static String uuidHexToUuid64(String uuidStr) {
    UUID uuid = UUID.fromString(uuidStr);
    byte[] bytes = uuidToBytes(uuid);
    return BASE64_URL_ENCODER.encodeToString(bytes);
  }

  public static String uuid64ToUuidHex(String uuid64) {
    byte[] decoded = Base64.getUrlDecoder().decode(uuid64);
    UUID uuid = uuidFromBytes(decoded);
    return uuid.toString();
  }

  public static byte[] uuidToBytes(UUID uuid) {
    ByteBuffer bb = ByteBuffer.wrap(new byte[16]);
    bb.putLong(uuid.getMostSignificantBits());
    bb.putLong(uuid.getLeastSignificantBits());
    return bb.array();
  }

  public static UUID uuidFromBytes(byte[] decoded) {
    ByteBuffer bb = ByteBuffer.wrap(decoded);
    long mostSigBits = bb.getLong();
    long leastSigBits = bb.getLong();
    return new UUID(mostSigBits, leastSigBits);
  }
}

The UUID encoded in Base64 is URL safe and without padding.

Sergey Ponomarev
  • 2,947
  • 1
  • 33
  • 43
3

This is not exactly what you asked for (it isn't Base64), but worth looking at, because of added flexibility: there is a Clojure library that implements a compact 26-char URL-safe representation of UUIDs (https://github.com/tonsky/compact-uuids).

Some highlights:

  • Produces strings that are 30% smaller (26 chars vs traditional 36 chars)
  • Supports full UUID range (128 bits)
  • Encoding-safe (uses only readable characters from ASCII)
  • URL/file-name safe
  • Lowercase/uppercase safe
  • Avoids ambiguous characters (i/I/l/L/1/O/o/0)
  • Alphabetical sort on encoded 26-char strings matches default UUID sort order

These are rather nice properties. I've been using this encoding in my applications both for database keys and for user-visible identifiers, and it works very well.

Jan Rychter
  • 582
  • 4
  • 13
  • Why do you use it for database keys, if the most effective format is 16 binary bytes? – kravemir Nov 01 '20 at 07:32
  • For convenience. Using an UUID in string form is obvious: every piece of software is capable of dealing with it. Using it as a key in binary form is an optimization which would incur a significant development and maintenance cost. I decided it isn't worth the effort. – Jan Rychter Nov 02 '20 at 08:28
3

You don't say what DBMS you're using, but it seems that RAW would be the best approach if you're concerned about saving space. You just need to remember to convert for all queries, or you'll risk a huge performance drop.

But I have to ask: are bytes really that expensive where you live?

kdgregory
  • 38,754
  • 10
  • 77
  • 102
  • Yes, I think so... I want to save as much space as possible while still having it be human readable. – mainstringargs Apr 21 '09 at 14:40
  • OK, why do you think so? Are you storing a billion rows? You'll save 8 billion bytes, which isn't much. Actually, you'll save less, because your DBMS might reserve additional space for encoding. And if you go with VARCHAR instead of fixed-size CHAR, you're going to lose the space needed to save the actual length. – kdgregory Apr 21 '09 at 14:52
  • ... and that "savings" is only if you use a CHAR(32). If you use RAW, you'll actually be saving space. – kdgregory Apr 21 '09 at 14:52
  • 10
    Any reasonable DBMS allows you to store UUIDs in native format, which requires 16 bytes. Any reasonable db tools will convert these to standard format (e.g. "cdaed56d-8712-414d-b346-01905d0026fe") in query results. People have been doing this for a long time. There's no need to re-invent the wheel. – Robert Lewis Apr 22 '09 at 20:54
  • 1
    He could be trying to include a UUID in a QR code, which would mean that the compression is useful in order to create a more easily scannable QR code. – nym Feb 26 '13 at 22:53
2

The codecs Base64Codec and Base64UrlCodec can encode UUIDs efficiently to base-64 and base-64-url.

// Returns a base-64 string
// input:: 01234567-89AB-4DEF-A123-456789ABCDEF
// output: ASNFZ4mrTe+hI0VniavN7w
String string = Base64Codec.INSTANCE.encode(uuid);
// Returns a base-64-url string
// input:: 01234567-89AB-4DEF-A123-456789ABCDEF
// output: ASNFZ4mrTe-hI0VniavN7w
String string = Base64UrlCodec.INSTANCE.encode(uuid);

There are codecs for other encodings in the same package of uuid-creator.

fabiolimace
  • 972
  • 11
  • 13
1

Surprised no one mentioned uuidToByteArray(…) from commons-lang3.

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.12.0</version>
</dependency>

And then the code will be

import org.apache.commons.lang3.Conversion;
import java.util.*;


public static byte[] asByteArray(UUID uuid) {
    return Conversion.uuidToByteArray(uuid, new byte[16], 0, 16);
}
Ori Popowski
  • 10,432
  • 15
  • 57
  • 79
1

Below is what I use for a UUID (Comb style). It includes code for converting a uuid string or uuid type to base64. I do it per 64 bits, so I don't deal with any equal signs:

JAVA

import java.util.Calendar;
import java.util.UUID;
import org.apache.commons.codec.binary.Base64;

public class UUIDUtil{
    public static UUID combUUID(){
        private UUID srcUUID = UUID.randomUUID();
        private java.sql.Timestamp ts = new java.sql.Timestamp(Calendar.getInstance().getTime().getTime());

        long upper16OfLowerUUID = this.zeroLower48BitsOfLong( srcUUID.getLeastSignificantBits() );
        long lower48Time = UUIDUtil.zeroUpper16BitsOfLong( ts );
        long lowerLongForNewUUID = upper16OfLowerUUID | lower48Time;
        return new UUID( srcUUID.getMostSignificantBits(), lowerLongForNewUUID );
    }   
    public static base64URLSafeOfUUIDObject( UUID uuid ){
        byte[] bytes = ByteBuffer.allocate(16).putLong(0, uuid.getLeastSignificantBits()).putLong(8, uuid.getMostSignificantBits()).array();
        return Base64.encodeBase64URLSafeString( bytes );
    }
    public static base64URLSafeOfUUIDString( String uuidString ){
    UUID uuid = UUID.fromString( uuidString );
        return UUIDUtil.base64URLSafeOfUUIDObject( uuid );
    }
    private static long zeroLower48BitsOfLong( long longVar ){
        long upper16BitMask =  -281474976710656L;
        return longVar & upper16BitMask;
    }
    private static void zeroUpper16BitsOfLong( long longVar ){
        long lower48BitMask =  281474976710656L-1L;
        return longVar & lower48BitMask;
    }
}
Dennis
  • 747
  • 7
  • 15
1

Here's my approach in kotlin:

            val uuid: UUID = UUID.randomUUID()
            val uid = BaseEncoding.base64Url().encode(
                ByteBuffer.allocate(16)
                    .putLong(uuid.mostSignificantBits)
                    .putLong(uuid.leastSignificantBits)
                    .array()
            ).trimEnd('=')

Mitch1077487
  • 799
  • 6
  • 8