5

assuming I have defined a HashSet<String> set.
What is better in terms of performance:

set.clear;

or

set = new HashSet<String>();

EDIT: the reason I am checking this is that currently my code has the second option, but I want to change it to the first one to make the Set final.

oshai
  • 14,865
  • 26
  • 84
  • 140
  • 3
    You could measure it using code – jmj Nov 02 '11 at 07:47
  • In short, if its more "space critical", then use "clear()"; otherwise just create an object. Garbage collector runs as a low priority thread, does not guarantee to recycle in a timely manner, especially when the set is big, in your case. – Chen Xie Jan 08 '13 at 00:32

6 Answers6

19

Although clear might be more performant, this depends on the size of the set. In practice, this is not likely to make a significant difference in the performance of your application. Even in the limited scope of lines of code around this function, performance will be dominated by other factors such as JIT compilation.

What is important is design quality, which will make it easy to refactor for performance after you have profiled your code. In most cases, avoiding hard-to-track state changes is important, and creating a new HashSet is better design than reuse of a HashSet.

Joshua Fox
  • 18,704
  • 23
  • 87
  • 147
6

HashSet clear calls the map.clear()

which is

 /**
  621        * Removes all of the mappings from this map.
  622        * The map will be empty after this call returns.
  623        */
  624       public void clear() {
  625           modCount++;
  626           Entry[] tab = table;
  627           for (int i = 0; i < tab.length; i++)
  628               tab[i] = null;
  629           size = 0;
  630       }

So it is certainly dependent on the size of the Set . But the answer would be to benchmark it on your app environment

Note: Referring to OpenJDK implementation in this talk

jmj
  • 237,923
  • 42
  • 401
  • 438
5

I would suggest you do what you believe is clearest and simplest.

The problem with trying to reuse HashSet is that most of the objects used internally are not recycled. If you want this use Javolution's FastSet http://javolution.org/target/site/apidocs/javolution/util/FastSet.html

HashSet is not the most efficient set collection so if you really care about this level of micro-optimisation you are likely to find that a different collection suits your use case better. However 99% of the time it is just fine and the most obvious choice for a hash set and I suspect it doesn't matter how you use it.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
3

A HashSet is backed by a HashMap, and the call to clear() reroutes to HashMap.clear(). That is (at least in Java 1.6) implemented as follows:

/**
 * Removes all of the mappings from this map.
 * The map will be empty after this call returns.
 */
public void clear() {
    modCount++;
    Entry[] tab = table;
    for (int i = 0; i < tab.length; i++)
        tab[i] = null;
    size = 0;
}

So I would suppose, that a new HashSet will be a lot faster, if your old HashSet was very large.

nfechner
  • 17,295
  • 7
  • 45
  • 64
  • 1
    I did a lot of algorithm optimizations and micro benchmarking. As one single operation one simple memory allocation is much faster then HashMap.clear(). If you do frequent allocations e.g. tens of millions per second in some sort of calculation algorithm then the difference will be less noticeable but even then it should be cheaper to allocate a new object and let the GC do bulk cleaning. – oᴉɹǝɥɔ Mar 18 '13 at 14:45
1

HashSet#clear will iterate through all of the elements in the backing HashMap and set them to null.

I would imagine that set = new HashSet<String>() will be faster, for increasingly large values of n.

Jon Newmuis
  • 25,722
  • 2
  • 45
  • 57
1

This sounds like an optimization you should avoid making unless you have profiled a problem. If not, then in general you'll probably be better off using the API the way it was intended (i.e. calling clear()).

If you have profiled a problem in an area of code using clear() or new HashMap(), then you may be better off looking at an alternative to HashSet that is faster. Unless you need to guarantee uniqueness at the collection level, then a LinkedList or ArrayList will always be faster.

pra9ma
  • 388
  • 1
  • 4