9

I am working on an application that has a lot of duplicate Strings and my task is to eliminate them to decrease memory usage. My first thought was to use String.intern to guarantee that only one reference of a String would exist. It worked to decrease the heap memory, but it increased the PermGen way too much; in fact, because there are many strings that are declared only once, the total amount of memory used by the application increased, actually.

After searching for another ideas, I found this approach: https://stackoverflow.com/a/725822/1384913.

It happened the same thing as String.intern: The String usage decreased, but the memory that I saved is being used in the WeakHashMap and WeakHashMap$Entry classes.

Is there an effective way to maintain only one reference for each String that doesn't spend the same amount of memory that I'm recovering doing it?

Community
  • 1
  • 1
Daniel Pereira
  • 2,720
  • 2
  • 28
  • 40
  • 2
    If a `WeakHashMap` doesn't save you enough memory, there's probably no way to do it. A `WeakHashMap` is realistically going to be the only solution that doesn't cost you lots of runtime to look up the `String` for a particular value. – Louis Wasserman Oct 15 '12 at 18:22
  • 3
    You can look at the answer I gave to a similar question http://stackoverflow.com/a/12793823/57695 – Peter Lawrey Oct 15 '12 at 18:23
  • An obvious point, but if you can classify different sources of Strings (i.e. distinguish between the ones that repeat a lot versus the ones that are used once) then you could do this more efficiently. That may not be possible for your application of course... – DNA Oct 15 '12 at 18:24
  • 1
    @PeterLawrey i +1'd your referenced answer. Too nice to only have a score of 1. Also, appropriate for here too. – Bohemian Oct 15 '12 at 18:27
  • I thought `Strings` were pooled like this anyway in java? They're immutable so why wouldn't they be? `Integers` certainly are (or so I've been told) – lynks Oct 15 '12 at 18:30
  • @lynks String constants are. Dynamically created strings are not. Although multiple dynamically created strings can refer to the same underlying `char[]`, depending on how you created them. – biziclop Oct 15 '12 at 18:32
  • Strings literals are interned. Strings are often temporary objects so interning them is not worth the extra work. Integers are cached rather than interned so a range of values will use the same objects. – Peter Lawrey Oct 15 '12 at 18:35
  • 5
    BTW Java 7 places interned String in the heap rather than perm gen. – Peter Lawrey Oct 15 '12 at 18:36
  • @PeterLawrey That could be the solution then. – biziclop Oct 15 '12 at 18:37
  • You should only `intern` (or use your other mechanism) for strings that you expect to hang around. Don't do it for everything. – Hot Licks Oct 15 '12 at 18:38
  • @PeterLawrey Your code seems to always keep the String references until the application finishes, even when they were used only once. In my case it would increase the memory, because the application has many Strings that are use only once in the code. – Daniel Pereira Oct 15 '12 at 19:11
  • The number of Strings is limited to whatever you set so the memory used will not increase more than you want. (Unless you have a small number of very large Strings in which case you could avoid interning them( – Peter Lawrey Oct 15 '12 at 19:19
  • possible duplicate of [Alternatives to Java string interning](http://stackoverflow.com/questions/12792942/alternatives-to-java-string-interning) – user207421 Oct 15 '12 at 20:39
  • +1: @Peter, didn't know that about Java 7 - thanks! – Dmitri Oct 15 '12 at 20:54

3 Answers3

1

I found an alternative to WeakHashMap: the WeakHashSet provided by Eclipse JDT library. It has the same behaviour that WeakHashMap, but it uses less memory. Also, you only need to call the method add and it will add the String in the set if it doesn't exist yet, or returning the existing one otherwise.

The only thing that I didn't like was the fact that it doesn't use generics, forcing the developer to cast the objects. My intern method turned out to be pretty simple, as you can see bellow:

Declaration of the WeakHashSet:

private static WeakHashSet stringPool = new WeakHashSet(30000); //30 thousand is the average number of Strings that the application keeps.

and the intern method:

public static String intern(String value) {
    if(value == null) {
        return null;
    }
    return (String) stringPool.add(value);
}
Daniel Pereira
  • 2,720
  • 2
  • 28
  • 40
0

Why dont you use StringBuilder/StringBuffer class instead of String. Using instance of this class, you can always use same instance with different values. - Ankur

Ankur Shanbhag
  • 7,746
  • 2
  • 28
  • 38
0

In a similar case, wherever possible, I refactored the string constants to enums. That way, you get two benefits:

  • enum instances are singletons, so you won't have memory problems
  • no typos when using Strings.

Cons:

  • a lot of work, with endless possibilities to make mistakes, if you don't have enough test cases
  • sometimes this is not trivial, for example when you have to interact with third party libraries you can't just edit...
  • simply a no-go if these are runtime determined, and not compile time...
ppeterka
  • 20,583
  • 6
  • 63
  • 78