Faster/Easier method of finding number of duplicates of values in an array

Question

I am currently using a HashMap to correspond the duplicate values and the number of times they are repeated. Its linear efficiency O(n) but I was looking for some built-in methods or a faster way to calculate the number of duplicates for each value in an array (like O(log n))?.

Here is my current shot that works:

String[] array = {"Henry", "Henry", "Henry", "Maxwell", "Maxwell"};

HashMap<String, Integer> duplicates = new HashMap<String, Integer>();
int numberOfDuplicates = 1;

for (int i = 0; i < array.length; i++)
{
    if (duplicates.put(array[i], numberOfDuplicates) != null) // Duplicate Key
    {
        numberOfDuplicates++;
    }
    else // New Key
    {
        numberOfDuplicates = 1;
    }

    duplicates.put(array[i], numberOfDuplicates);
}


// Print out duplicate counts
for (String key : duplicates.keySet()) {
    System.out.println(key + " " + duplicates.get(key));
}

What about a faster way/pragmatic way? 10Q.

If it ain't broke, don't fix it. If you've got it to O(n) there's no reason to try to improve it, and the standard SDK doesn't have a method for this. Your code could look better though. — Kayaman, Jul 03 '15 at 05:32
You could always use a multiset, such as from Guava, but it's basically doing this same thing. — chrylis -cautiouslyoptimistic-, Jul 03 '15 at 05:35
If you really needed a speed increase I'd consider a few options. 1. Check that sorting and counting is not faster, it may be O(n log n ) but the constant can be much lower since it doesn't need to do any allocations. 2. Consider using one of the maps from a Trove (or similar) since they're optimized to not use Integer, but actually use `int`. Which means less overheads and they're fast. (and I think they can be set to default initialize to zero, and have an increment function...) — Michael Anderson, Jul 03 '15 at 05:38
You might optimze your code (not the logic) if you have perfomance issues `containsKey(k) instead of put(k, v)` / want to reduce the gc (get rid of the autoboxing of the counter`. Using a `Set` of name objects (which contains the field `name` and an `int` counter might also a solution. All depends on what you want to achive. — SubOptimal, Jul 03 '15 at 05:40

score 1 · Answer 1 · answered Jul 03 '15 at 05:36

Here's a shot at removing some of the clutter.

String[] array = {"Henry", "Henry", "Henry", "Maxwell", "Maxwell"};

HashMap<String, Integer> duplicates = new HashMap<String, Integer>();

for (String s : array) {
    Integer i = duplicates.get(s);
    duplicates.put(s, i == null ? 1 : (i+1));
}

score 1 · Answer 2 · answered Jul 03 '15 at 05:40

You also can do it in following way

        if(duplicates.containsKey(array[i])){
            duplicates.put(array[i],duplicates.get(array[i])+1);
        }else{
            duplicates.put(array[i], 1);
        }

instead of

if (duplicates.put(array[i], numberOfDuplicates) != null) // Duplicate Key
    {
        numberOfDuplicates++;
    }
    else // New Key
    {
        numberOfDuplicates = 1;
    }

score 1 · Accepted Answer · answered Jul 03 '15 at 05:43

1

You can write it with less code using Java 8 Streams :

Map<String, Integer> duplicates =
    Arrays.stream(array)
          .collect(Collectors.groupingBy(e -> e, 
                                         Collectors.reducing(0, e -> 1, Integer::sum);

answered Jul 03 '15 at 05:43

Eran

387,369
54
702
768

Michael Anderson · Answer 4 · 2015-07-03T05:59:54.623

Trove Version

This is a modification of Kayamans answer using Trove, which is a high-performance collection library.

String[] array = {"Henry", "Henry", "Henry", "Maxwell", "Maxwell"};

TObjectIntMap<String> duplicates = new TObjectIntHashMap<String>();
for(String s: array) {
   duplicates.adjustOrPutValue(s,1,1);
}

duplicates.forEachEntry( new TObjectIntProcedure<String>() {
   void execute(String key, int value) {
      System.out.println(key + " " + value);
   };  
});

In place sort version

This version uses Arrays.sort and then steps through the array reporting duplicates. While Arrays.sort is O(n log n) the resulting algorithm may be faster due to it avoiding any allocations of data-structures - but it does change the order of the input array.

NOTE 1: In this case the timing will be dominated by the IO calls, so you may not notice the speed.

NOTE 2: I'd refactor and extract the core of this and use a functor to handle the processing of the duplicates.

Arrays.sort(array);
String last = null;
int count = 0;
for(String v:array) {

    // Is it the first value
    if(last = null) {
       last = v;
       count = 1;
       continue;
    }

    // Have we started a new value?
    if(last.equals(v)) {
       System.out.println(last + " " +count);
       last = v;
       count = 1;
       continue;
    }

    // Its a repeated value.
    ++count;
}

if(last!=null)
   System.out.println(last + " " +count);

Faster/Easier method of finding number of duplicates of values in an array

4 Answers4

Trove Version

In place sort version