6

In Java, using the following function for a huge matrix X to print its column-distinct elements:

// create the list of distinct values
List<Integer> values = new ArrayList<Integer>();

// X is n * m int[][] matrix
for (int j = 0, x; j < m; j++) {
    values.clear();
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (values.contains(x)) continue;
        System.out.println(x);
        values.add(x);
    }
}

First I iterate by columns (index j) and inside by rows (index i).

This function will be called millions of times for different matrices, so the code should be optimized to meet the performance requirements. I'm wondering about the values array. Would it be faster to use values = new ArrayList<Integer>(); or values = null instead of values.clear() ?

braX
  • 11,506
  • 5
  • 20
  • 33
Sophie Sperner
  • 4,428
  • 8
  • 35
  • 55

4 Answers4

16

What would be much more efficient would be to use a Set instead of a list, for example the HashSet implementation. The contains method will run in O(1) instead of O(n) with a list. And you could save one call by only calling the add method.

As for your specific question, I would just create a new Set at each loop - object creation is not that expensive, probably less than clearing the set (as confirmed by the benchmark at the bottom - see the most efficient version in EDIT 2):

for (int j = 0, x; j < m; j++) {
    Set<Integer> values = new HashSet<Integer>();
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
}

However, the only way to know which is quicker (new object vs. clear) is to profile that portion of your code and check the performance of both versions.

EDIT

I ran a quick benchmark and the clear version seems a little faster than creating a set at each loop (by about 20%). You should still check on your dataset / use case which one is better. Faster code with my dataset:

Set<Integer> values = new HashSet<Integer>();
for (int j = 0, x; j < m; j++) {
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
    values.clear();
}

EDIT 2

An actually even faster version of the code is obtained by creating a new set of the right size at each loop:

for (int j = 0, x; j < m; j++) {
    Set<Integer> values = new HashSet<Integer>(n, 1); //right size from the beginning
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
}

Summary of result

After JVM warm up + JIT:

Set<Integer> values = new HashSet<Integer>(n, 1); =====> 280 ms
values.clear();                                   =====> 380 ms
Set<Integer> values = new HashSet<Integer>();     =====> 450 ms 
assylias
  • 321,522
  • 82
  • 660
  • 783
  • @SophieSperner take care, set are not ordered, so your loops are deprecated – cl-r Jul 31 '12 at 12:31
  • 2
    @aromero -- It sure it. Quote from the Javadocs: "This class offers constant time performance for the basic operations (add, remove, contains and size)..." – Ernest Friedman-Hill Jul 31 '12 at 12:31
  • 2
    @cl-r That makes no sense at all, nowhere in any of this code is anyone looping over a list *or* a set. – Ernest Friedman-Hill Jul 31 '12 at 12:32
  • @ErnestFriedman-Hill constant time performance isn't the same as the O notation (refers always to the worse case), besides you need "...assuming the hash function disperses the elements properly among the buckets". – aromero Jul 31 '12 at 12:34
  • @aromero One can assume that the Integer hashcode function is good enough (it returns the underlying int, so could not be more spread). – assylias Jul 31 '12 at 12:36
  • 1
    @assylias you can give me all assumptions you can imagine, still O(n) :/ – aromero Jul 31 '12 at 12:38
  • 7
    @aromero "O(1)" and "constant time" absolutely *do* mean the same thing; feel free to link to a reference that says otherwise. While I agree that this is the best-case scenario, and the Javadoc says as much, as long as the integers don't arrive in sorted order, things will work fine. – Ernest Friedman-Hill Jul 31 '12 at 12:38
  • 2
    @aromero What you are saying is factually wrong. Feel free to share a reference if you can find one. – assylias Jul 31 '12 at 12:39
  • 1
    Printing out the results as you go will be very slow compared to everything else. If you are going to time it, do so without printing. – Peter Lawrey Jul 31 '12 at 12:52
  • 1
    The clear() version is almost always faster (unless the collection is very full and you don't put in much after that). The reason is simple - the HashMap (HashSet is internally using that) uses an array for the first bucket lookup - the array is a.) most likely properly sized from the previous run and b.) hot in the CPU's cache (which a newly allocated array would not be). – Durandal Jul 31 '12 at 12:59
  • @Durandal Actually no. What was making the new object version slower was the resizing of the set. If the set is created with the right size from the start, the performance is better than the clear version. – assylias Jul 31 '12 at 13:04
  • 1
    Thats somewhat surprising, although you did make use of point a.) to get there. Seems the VM always has a surprise up its sleeve. – Durandal Jul 31 '12 at 13:13
  • It might depend hugely on the X array content, but a BitSet (if applicable) would mean a O(1) access time. Also, I would initialize the HashSet with an initial size of `n/0.75`. There is some detail in the PD's of my answer. – Javier Jul 31 '12 at 14:23
  • I think aromero is right O(n) for worst case time if you don't have any guarantee that many items won't "hit" the same bucket (i.e. if x%hashCapacity is the same for them, e.g. for capacity=32, items 1,33,65, etc). – Javier Jul 31 '12 at 14:28
  • 1
    @Javier, you are right, n / 0.75 saves one resizing. I have edited to use a set(n, 1) i.e. load factor = 1. – assylias Jul 31 '12 at 14:37
  • I'm sorry, n / 0.75 = 4/3 * n > n, why should I use the initial capacity which is greater than n ? – Sophie Sperner Aug 07 '12 at 11:19
  • 1
    @SophieSperner Because the list will be resized when it reaches 75% of its capacity, unless you change the default setting as I did in my last example (the `1` parameter says: don't resize until you are 100% full). – assylias Aug 08 '12 at 00:45
3

(corrected as of 2015-09-04 to include reproducible benchmark and conclusions)

  • Of course values.clear() is faster than creating a new object (just sets the last item index at zero). Almost certainly a values.clear() would be faster than creating a new object. In the case of the ArrayList you used initially, it would just set set the insertion index to zero.

  • As I commented at P.D.#1 BitSet might be a fastest approach for this case where the elements are integers (assuming the range of values is not too broad. However that might not be useful for any other type of elements.

  • Also as said by as i coincided with Assylias answer, HashSet is a better choice than ArrayList (assuming hashCode() gives a decent distribution which doesn't lead us to a O(N) performance).

    In this HashSet case, intuition would also suggest that clear() (which basically sets the HashSet#table array of "pigeon holes" to null) would be faster than building a brand new collection (which in any case requires that same table to be initialized/resseted to zeros). But in this particular case things happen the other way round. Assylias published his results. Unfortunately I have had to code my bechmark myself in order to find out how could this happen. I go over this issue in P.D.#3

    Anyway the main thing about this is that since creating a brand new HashSet for every iteration has not a substantial penalty, it makes sense to do so (since it is simpler), unless we must take bigger care about performance and resources.

  • Another issue about performance will be I/O. That System.out.println()in the sample code probably does a flush() for every line, which automatically will shift the bottleneck into the console/stdout. A workaround might be adding to a StringBuffer. Unless there is a reader process waiting avidly for that output, it might make sense to delay the write until the end of the loop.

This would be my try:

Set<Integer> values = new HashSet<Integer>();
// (PD 1) Or new BitSet(max_x - min_x + 1);
// (PD 2) Or new HashSet((int)Math.ceil(n/0.75));
StringBuffer sb = new StringBuffer(); // appends row values for printing together.

for (int j = 0, x; j < m; j++) {
    values.clear();
    sb.setLength(0);
    for (int i = 0; i < n; i++) {
         x = X[i][j];
         if (! values.contains(x)){
             sb.append(x+"\n");
             values.add(x);
         }
    }
    System.out.print(sb);
}

P.D.1. Also if you might consider using BitSet. It has a O(1) access performance(even in worst case, as there are no collisions). It will be best fit for integers with a range starting with 0 (otherwise it might require translation) and a population of actual values dense enough within the possible distribution.

  • For example if you checking the occurrence of Unicode codepoints you will need a 139,264 bytes long array (17 (planes) * 216 (codepoints/plane) / 8), where maybe you are using just 40 different characters in a 100 character-long text message, that might be an overkill. But if you were restricted to the 256 possible values within ISO-Latin-1. (8 bytes bitset), that would be actually be a perfect fit.

P.D.2. Also, as Assylias says, setting an initial size for HashSet may help. As threshold = (int)(capacity * loadFactor) , you might want an initialCapacity=(int)Math.ceil(n/0.75) to be sure there is no resizing. That concern belongs to Assylias post (I didn't use it for myself) and is unproper to discuss in this manner


P.D.3 (September 2015:3 years after) I happened to revisit this question and I was so intrigued about Assylas results I coded my own micro-benchmark (which I include, so anyone can replicate). These are my conclusions:

  • The BitSet I proposed (note: Won't be fit for non-integers and very sparsely-packed distributions) clearly outperforms all the flavors of HashSet (around 4 times faster in densely packed distributions)
  • Tests for a highly filled set with a size of 1000 show slight advantage in favor of creating new collection (7.7" vs 9.8"). However, the "dry run" of HashSet#clear() vs new HashSet() will throw opposite results (9.5" vs 7.5"). My guess is that is because a penalty of invalidating cache when resetting HashSet.table (setting null where it was not null).
  • Also it is a big advantage knowing the optimal size beforehand (which might not always be feasible). the HashSet.clear() approach is more adaptive and will tolerate much better underestimating the size. Overestimating wont mean much so much difference, but might not be a good strategy if memory is an issue.
  • The results clearly show that nowadays creating an object and allocating memory is not a big issue (See Programmers.SE). However, reusing objects should still be an option to be considered. See for example in drmirror how even after JDK 1.6 evolution, reusing instances (CharBuffer) doubles performance.
  • Also I wondered what was the impact of Assylias using a loadFactor==1.0f (HashSet wont resize until size > table.length*loadFactor, which is different from what i suggested him, but is perfect if there are no collisions). Roughly loadFactor==0.75f requires 1.33 times the table space in exchange of avoiding a 25% of collisions. My tests didn't show any advantage of of the default setting for this scenario.

Here is the class I used for my tests. Sorry if it might be overshooting in some aspects and lacking in others (no warm up, just execute long enough so an implementation has the chance to choke on his own garbage).

/**
 * Messing around this StackOverflow question:   https://stackoverflow.com/questions/11740013/fastest-way-to-recreate-the-arraylist-in-a-for-loop/ .
 * Quite surprisingly new HashSet() (which should imply actual memory initialization) is faster than HashSet.clear() in the given scenario.
 * Primary goal is to test this phenomenon (new vs clear) under different scenarios.
 * Secondarily a bit about the BitSet and the HashSet loadFactor is tested.
 * @author Javier
 */
public class TestSetClear2 {

public static interface MicroBenchmark {
    public String getName();
    /**
     * 
     * @param dataSet Data set to insert in the collection
     * @param initialSize Initial size for the collection. Can try to be optimal or try to fool.
     * @param iterations Number of times to go through the dataSet over and over
     */
    public void run(int[] dataSet, int initialSize, int iterations);
}

/** Bad initial case. Based in question code */
public static class MBList implements MicroBenchmark {
    @Override public String getName() { return "ArrayList.clear()"; }
    @Override public void run(int[] data, int initialSize, int n) {
        // Not taking initial size into account may result in a resizing penalty in the first iteration
        // But will have an adequate size in following iterations, and wont be fooled by bad estimations. 
        List<Integer> values = new ArrayList<Integer>();
        for (int iter = 0; iter < n; iter++) {
            values.clear();
            for (int i = 0; i < data.length; i++) {
                int x = data[i];
                if (values.contains(x)) continue;
                values.add(x);
            }
        }
    }
}

/** new HashSet(N,1) for every iteration. Reported as best by assylias. */
public static class MBNewHashSetN1 implements MicroBenchmark {
    @Override public String getName() { return "new HashSet(N,1)"; }
    @Override public void run(int[] data, int initialSize,  int n) {
        for (int iter = 0; iter < n; iter++) {
            Set<Integer> values = new HashSet<>(initialSize, 1.0f); // 1.0 loadfactor optimal if no collisions.
            for (int i = 0; i < data.length; i++) {
                int x = data[i];
                if (values.contains(x)) continue;
                values.add(x);
            }
        }
    }
}

// No need to implement raw new HashSet() (reported as worse). Will be enough fooling to initialize to 16 so it succumbs to resizing.

/** HashsetClear for every iteration. Attempted by Assylias and Javier. Clear() does not perform as well as expected under basic tests. */
public static class MBHashSetClear implements MicroBenchmark {
    private float loadFactor; // Allow loadFactor to check how much 1.0 factor affects if there are collisions.
    private String name;
    public MBHashSetClear(float loadFactor) {
        this.loadFactor = loadFactor;
        name = String.format(Locale.ENGLISH, "HashSet(N,%f).clear()", loadFactor);
    }
    @Override public String getName() { return name; }
    @Override public void run(int[] data, int initialSize, int n) {
        HashSet<Integer> values = new HashSet<>((int)Math.ceil(initialSize/loadFactor), loadFactor);// Just the size for loadfactor so it wont resize.
        for (int iter = 0; iter < n; iter++) {
            values.clear();
            for (int i = 0; i < data.length; i++) {
                int x = data[i];
                if (values.contains(x)) continue;
                values.add(x);
            }
        }
    }
}

/** Javier BitSet. Might clearly outperform HashSet, but only on the very specific constraints of the test (non negative integers, not hugely big). */
public static class MBBitSet implements MicroBenchmark {
    @Override public String getName() { return "BitSet.clear()"; }
    @Override public void run(int[] data, int distributionSize, int n) {
        BitSet values = new BitSet(distributionSize);
        for (int iter = 0; iter < n; iter++) {
            values.clear();
            for (int i = 0; i < data.length; i++) {
                int x = data[i];
                if (values.get(x)) continue;
                values.set(x);
            }
        }
    }
}

public static void main(String[] args) {
    final MicroBenchmark mbNew = new MBNewHashSetN1();
    // Create with same loadFactor as MBNewHashSetN1. So we compare apples with apples (same size of handled table, same collisions).
    final MicroBenchmark mbClear = new MBHashSetClear(1.0f);
    final MicroBenchmark mbClear075 = new MBHashSetClear(0.75f);
    final MicroBenchmark mbBitset = new MBBitSet();
    final MicroBenchmark mbList = new MBList(); // Will have a taste of O(N) with a not too bit dataset.

    // warmup. trigger the cpu high performance mode? Fill the heap with garbage?
    //mbNew.run(dataSetE3xE3, 1000, (int)1e5); // Using new HS might give a bit advantage?

    int timePerTest = 10000;
    int distributionSize, initialCapacity, datasetLength;

    // 1000 long and values 0..999 (1e3 x 1e3). Optimal initial capacity
    distributionSize = 1000; datasetLength = 1000; initialCapacity = 1000;
    final int[] dataSetE3xE3 = generateRandomSet(1000,1000);
    runBenchmark("E3xE3", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear, mbClear075, mbBitset);
    // repeat with underestimated initial size. Will incur in resizing penalty
    initialCapacity = 16; // Default initial
    runBenchmark("E3xE3+underSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear, mbBitset);
    // repeat with overestimated initial size. larger garbage and clearing.
    initialCapacity = 100000; // oversized will force to handle large tables filled with 0 / null.
    runBenchmark("E3xE3+overSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear, mbBitset);
    // Dry run (not rum). what if we focus on the new and clear operations. Just 1 item so clear() is forced to traverse the table.
    datasetLength = 1; distributionSize = 1000; initialCapacity = 1000;
    runBenchmark("E3xE3-DryRun", generateRandomSet(datasetLength, distributionSize),
            distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear);

    // check for * 100 and / 100 sizes.
    distributionSize = datasetLength = initialCapacity = 10;
    runBenchmark("E1xE1", 
            generateRandomSet(datasetLength, distributionSize),
            distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear, mbList);
    distributionSize = datasetLength = initialCapacity = 100000;
    runBenchmark("E5xE5", generateRandomSet(datasetLength, distributionSize),
            distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear);

    // Concentrated distributions might behave as with oversized?
    datasetLength=10000; distributionSize=10; initialCapacity=Math.min(datasetLength, distributionSize);
    runBenchmark("E4xE1", 
            generateRandomSet(datasetLength, distributionSize),
            distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear);

    // Sparse distributions might allow mild collision. Also adverse for BitSet.
    // TODO Generate a greater/known amount of collisions
    datasetLength=10000; distributionSize=(int)1e6; initialCapacity=Math.min(datasetLength, distributionSize);
    runBenchmark("E4xE6", 
            generateRandomSet(datasetLength, distributionSize),
            distributionSize, timePerTest, initialCapacity,
            mbNew, mbClear, mbClear075);

}

private static void runBenchmark(String testName, int[] dataSet, int distributionSize, int timePerTest
        , int initialCapacity, MicroBenchmark ... testees /* not testes */) {
    // How many iterations? Will use first testee to callibrate.
    MicroBenchmark curTest = testees[0];
    long t0 = System.nanoTime();
    long ellapsed = 0L;
    final long minToCallibrate = (long)0.5e9; // half second
    int iterations = 1;
    while (ellapsed < minToCallibrate) {
        curTest.run(dataSet, initialCapacity, iterations);

        iterations *= 2; // same as <<= 1
        ellapsed = System.nanoTime() - t0;
    }
    // calculation is not laser-sharp precise (actually executed iterations -1, and some extra initializations).
    final int nIterations = (int) ((double)iterations * timePerTest  * 1e6 /* nanos/millis */ / ellapsed);

    // Do actual benchmark
    System.out.printf(Locale.ENGLISH, "dataset:{name=%s,length:%d,distrib:%d,capacity0:%d,iterations:%d}\n",
            testName, dataSet.length, distributionSize, initialCapacity, nIterations);

    for (MicroBenchmark testee : testees) {
        t0 = System.nanoTime();
        testee.run(dataSet, initialCapacity, nIterations);
        ellapsed = System.nanoTime() - t0;
        System.out.printf(Locale.ENGLISH, "%s : %5.3f\n", testee.getName(), ellapsed/1e9 );

    }

}

private static int[] generateRandomSet(int lengthOfSet, int distributionSize) {
    Random r = new Random();
    int[] result = new int[lengthOfSet];
    for (int i = 0; i < lengthOfSet; i++) {
        result[i] = r.nextInt(distributionSize);
    }
    return result;
}
}

Here are my results (using JDK 1.8.0_31 - 64 bits - Windows 7 )

dataset:{name=E3xE3,length:1000,distrib:1000,capacity0:1000,iterations:514241}
new HashSet(N,1) : 7.688
HashSet(N,1.000000).clear() : 9.796
HashSet(N,0.750000).clear() : 9.923
BitSet.clear() : 1.990
dataset:{name=E3xE3+underSize,length:1000,distrib:1000,capacity0:16,iterations:420572}
new HashSet(N,1) : 9.735
HashSet(N,1.000000).clear() : 6.637
BitSet.clear() : 1.611
dataset:{name=E3xE3+overSize,length:1000,distrib:1000,capacity0:100000,iterations:143032}
new HashSet(N,1) : 9.948
HashSet(N,1.000000).clear() : 10.377
BitSet.clear() : 0.447
dataset:{name=E3xE3-DryRun,length:1,distrib:1000,capacity0:1000,iterations:18511486}
new HashSet(N,1) : 9.583
HashSet(N,1.000000).clear() : 7.523
dataset:{name=E1xE1,length:10,distrib:10,capacity0:10,iterations:76177852}
new HashSet(N,1) : 9.988
HashSet(N,1.000000).clear() : 10.521
ArrayList.clear() : 7.915
dataset:{name=E5xE5,length:100000,distrib:100000,capacity0:100000,iterations:2892}
new HashSet(N,1) : 9.764
HashSet(N,1.000000).clear() : 9.615
dataset:{name=E4xE1,length:10000,distrib:10,capacity0:10,iterations:170386}
new HashSet(N,1) : 9.843
HashSet(N,1.000000).clear() : 9.708
dataset:{name=E4xE6,length:10000,distrib:1000000,capacity0:10000,iterations:36497}
new HashSet(N,1) : 9.686
HashSet(N,1.000000).clear() : 10.079
HashSet(N,0.750000).clear() : 10.008
Community
  • 1
  • 1
Javier
  • 678
  • 5
  • 17
  • 1
    HashSet.clear() most decidedly does *not* just set an index to zero; it loops over an array and nulls out each element. – Ernest Friedman-Hill Jul 31 '12 at 12:35
  • 1
    And ArrayList.clear() does not just set the size to zero and null *one* index. To ensure that it does not retain references to elements that should not be retained it needs to clear *all* indices formerly used. – Durandal Jul 31 '12 at 12:40
  • Corrected. Thanks for the observation. – Javier Jul 31 '12 at 12:46
  • @ErnestFriedman-Hill Corrected. Please check if you like it better now. – Javier Jul 31 '12 at 12:54
  • @Durandal Corrected. Please check if you like it better now. – Javier Jul 31 '12 at 12:54
  • 1
    *"Of course values.clear() is faster than creating a new object"* => actually no, see my edited answer. – assylias Jul 31 '12 at 13:08
  • 1
    Indeed. I reversed my downvote, but still, it's not at all clear that `clear()` is *always* better. – Ernest Friedman-Hill Jul 31 '12 at 13:22
  • @assylias Might be as you say. Just a couple things: You might agree that microbenchmarking should be taken with a grain of salt (not because in a case it worked better it should do consistently), and also: the "nullifying" of the items might save work for the GC (seems plausible, but not really sure about this one). – Javier Jul 31 '12 at 13:40
  • 1
    @Javier I agree - I have tried to avoid the main pitfalls of a micro benchmark by making sure the JVM was warmed-up on dummy data first and that the methods were compiled by the JIT before running the test. In the end, as I commented, it depends on may factors, but my point was to say that the obvious thing (clear is quicker) was not that obvious at all. JVM performance is often counter intuitive. – assylias Jul 31 '12 at 14:09
  • @assylias Might be a bit dumb on my side to revise my post so much time afterwards. I did my own benchmark. Basicly you were right, there is just this issue about how comes HashSet.clear() was sub-optimal. I hope I don't sound reauthorizing. – Javier Sep 04 '15 at 20:39
1

You can use ArrayList.clear(); this keep address ArrayList in memory, without garbage collector effect on this address.

cl-r
  • 1,264
  • 1
  • 12
  • 26
  • Interesting suggestion, what if I'm using several ArrayLists in parallel. I just simplified the code for this question. So if I use several ArrayLists, this will not work? – Sophie Sperner Jul 31 '12 at 12:31
  • I you use ArraList[1]> the first ArrayList[1].clear() destroy all ArrayList[2], so you can only clear [2]ones to keep the structure OK. – cl-r Jul 31 '12 at 12:34
  • No, I'm using values, otherArrayList, yetAnotherArrayList inside of those for loops, how then the compiler will understand which one to clear when I call static ArrayList.clear().. – Sophie Sperner Jul 31 '12 at 12:37
  • There is no static `ArrayList.clear()` and as others have pointed out (on a now deleted answer), `clear()` will traverse the whole list to null every element which is slower than the `HashSet` approach. – Philipp Reichart Jul 31 '12 at 12:46
  • ArrayList.clear() is not static in java.util; package. It just put to 'null' all fields. inline array buffer would be normaly maintened (the same thing for Set.clear()). If ordered values is not needed you can use Set as explained by others. – cl-r Jul 31 '12 at 12:54
0

You should use .clear() methods , using this you haven't need to assign memory your variable again and again.

JaypeeRohilla
  • 33
  • 1
  • 5