2

Are UUIDs in Java interned like Strings? If not, should I be trying to recycle UUID objects to minimize RAM usage?

I use UUIDs as data type of database primary key & foreign key columns. So this means many rows repeating the use of UUID for shared foreign key value.

So when retrieving rows from the database, should I check to see if each UUID is a duplicate, and if duplicated, use the original object reference? Or is this being done on my behalf already, similar to how Strings are interned?

…  // common JDBC code
UUID id = null ;
while (rs.next()) {
    UUID idFresh = rs.getObject( 1 ); 
    // Recycle the UUID object where possible.
    id = ( ( null == id ) || idFresh.equals( id ) ) ? idFresh : id ;  // If null or identical, use the existing object reference.
    String name = rs.getString( 2 );
}
…
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • 1
    Is RAM usage actually a problem in your program? – bcsb1001 Aug 03 '17 at 23:21
  • Looking at the [source code on grepcode](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/UUID.java#UUID) I see nothing that attempts to look for an already existing equivalent UUID. There is also very little state per instance. – President James K. Polk Aug 03 '17 at 23:21
  • From what I know, Strings have special treatment in Java in that aspect. And whether/how something is cached would probably depend on implementation (of ORM, jdbc driver or such), I suspect some do that, but I wouldn't assume it is true for everything out there. – Luke Aug 03 '17 at 23:24
  • Why? They're all different, by definition. What possible benefit could there be from pooling them? – user207421 Aug 03 '17 at 23:37
  • 1
    Strings normally aren't interned. – chrylis -cautiouslyoptimistic- Aug 03 '17 at 23:40
  • @EJP You are right that generated UUIDs are unique and sharing will not work. If you read the data from a database or from the network, there are situations where deduplication can make sense even for UUIDs (see my answer). – Philipp Claßen Aug 04 '17 at 00:55

2 Answers2

4

A quick look into the java runtime source code shows that UUIDs are not interned.

And it would probably be a bad idea to intern them, because if you were to traverse a large database, UUID interning could cause the JVM to run out of memory simply due to never foregtting any UUID it has seen.

Also, there is not much benefit to interning UUIDs, because

  • They don't occupy much space
    (basically just the UUID’s 128-bit value stored as a pair of long)

  • UUID comparison and hashcode computation is cheap.
    (One of the greatest benefits of String interning is that the hashcode of the string gets computed only once, which is a bit of a concern because its computation can be slightly expensive.)

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
1

UUIDs (and also strings) are not automatically deduplicated. In general, it would also be a bad idea, as newly created UUIDs should be unique, so sharing will not work.

When you refer to string interning, it is true that the JVM will share strings in specific case, for instance:

String x = "ab";
String y = "a" + "b";
assert x == y; // references are identical (x and y are shared)

These are strings, however, that can be resolved at compile time. If you create a string or UUID at runtime, it will always create a new object.

In your question, you describe a different scenario, though. Here, you are reading UUIDs from a database. Depending on the data, there could be good opportunities for sharing UUIDs, or there could be none (e.g., if the UUID is used as the primary key).

id | name  | country
1  | A     | <UUID-1>
2  | B     | <UUID-1>
3  | C     | <UUID-2>
4  | D     | <UUID-1>
5  | E     | <UUID-1>

(Note that when reading the UUIDs from the database or from the the network, you cannot assume that the UUIDs will be deduplicated. In general, you will receive copies of the same value.)

So, if your data looks like above, sharing of UUIDs can make sense. But will it reduce the memory usage?

An UUID is an object with two long variables. In a 64-bit JVM, this will take 32 bytes. If you share the UUID, then you will only pay the 32 bytes once, and afterwards pay only 8 bytes for the reference. If you use compressed pointers, the reference will fit in 4 bytes.

Is this gain significant enough? That depends on your specific application. In general, I would not share an UUID. I have worked on an application, however, where sharing the UUID was indeed an improvement. Reducing memory usage down was critical, and the reduction from a full object to a reference was an improvement.

Having said that, this type of optimization is rarely needed. As a rule of thumb, I would only do it if UUIDs are heavily shared and reducing memory at all costs is necessary. Otherwise, the CPU overhead of deduplicating them and the extra complexity in the code is often not worth it, or worse, could even slow down your application.

If you decide to deduplicate them, how will you do it? There is no built-in function like String#intern, but you can manually create a map to deduplicate. Depending on whether you want to deduplicate globally or only locally within in the current function call, you can use a ConcurrentHashMap or simply a (non-synchronized) HashMap.


As a side-note, not directly related to your question, I mentioned String#intern as it is part of the String API. However, I would strongly recommend against using it, as it is a huge performance bottleneck. Doing the deduplication yourself will be significantly faster.

Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239