UUIDs (and also strings) are not automatically deduplicated. In general, it would also be a bad idea, as newly created UUIDs should be unique, so sharing will not work.
When you refer to string interning, it is true that the JVM will share strings in specific case, for instance:
String x = "ab";
String y = "a" + "b";
assert x == y; // references are identical (x and y are shared)
These are strings, however, that can be resolved at compile time. If you create a string or UUID at runtime, it will always create a new object.
In your question, you describe a different scenario, though. Here, you are reading UUIDs from a database. Depending on the data, there could be good opportunities for sharing UUIDs, or there could be none (e.g., if the UUID is used as the primary key).
id | name | country
1 | A | <UUID-1>
2 | B | <UUID-1>
3 | C | <UUID-2>
4 | D | <UUID-1>
5 | E | <UUID-1>
(Note that when reading the UUIDs from the database or from the the network, you cannot assume that the UUIDs will be deduplicated. In general, you will receive copies of the same value.)
So, if your data looks like above, sharing of UUIDs can make sense. But will it reduce the memory usage?
An UUID is an object with two long
variables. In a 64-bit JVM, this will take 32 bytes. If you share the UUID, then you will only pay the 32 bytes once, and afterwards pay only 8 bytes for the reference. If you use compressed pointers, the reference will fit in 4 bytes.
Is this gain significant enough? That depends on your specific application. In general, I would not share an UUID. I have worked on an application, however, where sharing the UUID was indeed an improvement. Reducing memory usage down was critical, and the reduction from a full object to a reference was an improvement.
Having said that, this type of optimization is rarely needed. As a rule of thumb, I would only do it if UUIDs are heavily shared and reducing memory at all costs is necessary. Otherwise, the CPU overhead of deduplicating them and the extra complexity in the code is often not worth it, or worse, could even slow down your application.
If you decide to deduplicate them, how will you do it? There is no built-in function like String#intern
, but you can manually create a map to deduplicate. Depending on whether you want to deduplicate globally or only locally within in the current function call, you can use a ConcurrentHashMap
or simply a (non-synchronized) HashMap
.
As a side-note, not directly related to your question, I mentioned String#intern
as it is part of the String API. However, I would strongly recommend against using it, as it is a huge performance bottleneck. Doing the deduplication yourself will be significantly faster.