Lots of people talk about the performance advantages of String.intern(), but I'm actually more interested in what the performance penalty may be.
My main concerns are:
- Search cost: The time that intern() takes to figure out if the internable string exists in the constants pool. How does that cost scale with the number of strings in that pool?
- Synchronization: obviously the constant pool is shared by the whole JVM. How does that pool behave when intern() is being called over and over from multiple threads? How much locking does it perform? How does the performance scale with contention?
I am concerned about all these things because I'm currently working on a financial application that has a problem of using too much memory because of duplicated Strings. Some strings basically look like enumerated values and can only have a limited number of potential values (such as currency names ("USD", "EUR")) exist in more than a million copies. String.intern() seems like a no-brainer in this case, but I'm worried about the synchronization overhead of calling intern() everytime I store a currency somewhere.
On top of that, some other types of strings can have millions of different values, but still have tens of thousands of copies of each (such as ISIN codes). For these, I'm concerned that interning a million string would basically slow down the intern() method so much as to bog down my application.