3

I am having trouble removing the duplicates from two arrays that have been merged into one. I have written the following code that merges the arrays, yet I'm not sure how to remove the duplicates from the final array. Assume the arrays are already sorted.

public static int[] merge(int[] list1, int[] list2) {
    int[] result = new int[list1.length + list2.length];

    int i = 0;
    int j = 0;

    for (int k = 0; k < (list1.length + list2.length); k++) {
        if (i >= list1.length) {
            result[k] = list2[j];
            j++;
        } 
        else if (j >= list2.length) {
            result[k] = list1[i];
            i++;
        } 
        else {
            if (list1[i] < list2[j]) {
                result[k] = list1[i];
                i++;
            } else {
                result[k] = list2[j];
                j++;
            }
        }
    }
    return result;
}
Compsciguy
  • 59
  • 1
  • 1
  • 9
  • remove them before merging. –  Feb 04 '17 at 20:53
  • I don't know what the context of the problem is, or how critical the memory usage/speed of your program is, but you should start using collections. You could do all this in one or two lines. – MikaelF Feb 05 '17 at 04:39
  • Collections will wrap everything to an Integer, and if you are use a HashSet, it will create a zillion internal objects. If your arrays are of any significant size, or you are using this a lot, you should use the standard algorithm for doing this, which I have given below. – rghome Jul 10 '23 at 13:13
  • @rghome note this is quite an old question. The [tradeoff](https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff) in memory consumption (in a `Set`, likely on the order of MB out of GB for sets sized in the millions) is that you save costs in cpu-consumption (~O(1) checks vs ~O(n)). A `HashSet` is a non-starter for this question outside of performing dupe checks: The OP has sorted data, and `HashSet` is inherently unordered. – Rogue Jul 10 '23 at 13:22
  • @Rouge It's not so much the memory space, but the expense of allocating it; i.e., all those Integers and other objects (depending on what classes you use), then the fact that they have to be garbage collected. There is no space-time trade-off. Using Collections will be slower _and_ use more memory. The trade-off is whether it is worth the programmer's time finding the best solution (hence the usefulness of this site!). – rghome Jul 10 '23 at 13:43

8 Answers8

5

Ok, someone hated all the answers. Here's another attempt that combines two stackoverflow q's, combining arrays and removing dupes.

This one runs a good deal faster than my earlier attempt on two lists of a million ints.

public int[] mergeArrays2(int[] arr1, int[] arr2){
    int[] merged = new int[arr1.length + arr2.length];
    System.arraycopy(arr1, 0, merged, 0, arr1.length);
    System.arraycopy(arr2, 0, merged, arr1.length, arr2.length);

    Set<Integer> nodupes = new HashSet<Integer>();

    for(int i=0;i<merged.length;i++){
        nodupes.add(merged[i]);
    }

    int[] nodupesarray = new int[nodupes.size()];
    int i = 0;
    Iterator<Integer> it = nodupes.iterator();
    while(it.hasNext()){
        nodupesarray[i] = it.next();
        i++;
    }



    return nodupesarray;
}

console output:

INFO [main] (TestMergeArray.java:40) - creating two lists of a million ints
DEBUG [main] (TestMergeArray.java:41) - list 1 size : 1000000
DEBUG [main] (TestMergeArray.java:42) - list 2 size : 1000000
INFO [main] (TestMergeArray.java:56) - now merging
INFO [main] (TestMergeArray.java:59) - done, final list size is 864975
Community
  • 1
  • 1
badperson
  • 1,554
  • 3
  • 19
  • 41
3

this clearer lambda solution is slightly slower because of the (un)boxing
requires Java 8 or above

public static int[] mergedistinct( int[] array1, int[] array2 ) {
  Stream<Integer> s1 = IntStream.of( array1 ).boxed();
  Stream<Integer> s2 = IntStream.of( array2 ).boxed();
  return( Stream.concat( s1, s2 ).distinct().mapToInt( i -> i ).toArray() );
}

[1, 2, 3, 5, 4, 7, 8]

if you need the array sorted:

…
return( Stream.concat( s1, s2 ).distinct().sorted().mapToInt( i -> i ).toArray() );
Kaplan
  • 2,572
  • 13
  • 14
  • There is `IntStream.concat()` also, `int[] foo = IntStream.concat(intStream1, intStream2).distinct().toArray();`, Maybe this could avoid (un)boxing? – Weekend Mar 15 '23 at 07:00
1

Here's a technique that iterates the arrays just once and does not use a hash to detect duplicates.

import java.util.List;
import java.util.ArrayList;
import java.util.Arrays;

public class SortedMerge {
    public static int[] merge(int[] array1, int[] array2) {
        int[] a;
        int[] b;
        List<Integer> c = new ArrayList<Integer>();
        int i = 0;
        int j = 0;

        // b is longer than a
        if (array1.length > array2.length) {
            a = array2;
            b = array1;
        } else {
            a = array1;
            b = array2;
        }

        while (j < b.length) {
            int bb = b[j];

            if (i < a.length) {
                int aa = a[i];

                if (aa > bb) {
                    c.add(bb);
                    j++;
                } else {
                    c.add(aa);
                    i++;
                    if (aa == bb) {
                        j++;
                    }
                }
            } else {
                c.add(bb);
                j++;
            }
        }
        // java 8 List<Integer> to int[]
        return c.stream().mapToInt(Integer::intValue).toArray();
    }

    public static void main(String[] args) throws Exception {
        int[] array1 = new int[]{3, 5, 8, 11, 14};
        int[] array2 = new int[]{1, 2, 3, 4, 6, 8, 14, 15, 17};
        int[] c = merge(array1, array2);

        for (int i = 0; i < c.length; i++) {
            System.out.format("%d,", c[i]);
        }
        System.out.println();
        // output> 1,2,3,4,5,6,8,11,14,15,17,
    }
}
Rz Mk
  • 1,072
  • 10
  • 15
0

Call your merge method and do the followings. I tested it. It works fine.

int[] result = merge(count, count1);

Set<Integer> set = new HashSet<Integer>();
try {
    for (int i = 0; i < result.length; i++) {
        set.add(result[i]);
    }
    System.out.println(set);
} catch (Exception e) { }
Community
  • 1
  • 1
Zenith
  • 1,037
  • 1
  • 10
  • 21
0

Can you use ArrayLists? ArrayLists would make this very easy to do.

 //Consider n1 to be some global or instance variable.

 import java.util.ArrayList;
 public void Add(ArrayList<Integer> n2) {

     for(int i = 0; i < n2.size(); i++) {
         if(!n1.contains(i))
             n1.add(n2.get(i));
     }
}
JoryAnderson
  • 103
  • 7
0
package com.string.merge;

import java.util.ArrayList;

public class MergeArrayAndRemoveDuplicate {
    public static void main(String[] args) {
        int[] a = {1, 2, 2, 3, 1, 5, 3};
        int[] b = {4, 3, 1, 5, 7, 8, 4, 2};

        ArrayList<Integer> l = new ArrayList<>();
        for (int i = 0; i < (a.length > b.length ? a.length : b.length); i++) {
            if (i < a.length) {
                int c = 0;
                while (c <= l.size()) {
                    if (l.contains(a[i]) == false) {
                        l.add(a[i]);
                    }
                    c++;
                }
            }
            if (i < b.length) {
                int c = 0;
                while (c <= l.size()) {
                    if (l.contains(b[i]) == false) {
                        l.add(b[i]);
                    }
                    c++;
                }
            }
        }
        System.out.println(l);
    }
}
o/p-[1, 4, 2, 3, 5, 7, 8]
0
import java.util.ArrayList;
import java.util.List;
public class MergeListAndRemoveDuplicate {
    public static void main(String[] args) {
        int a[] = {1, 1, 2, 1, 3, 4, 1, 2, 5};
        int b[] = {1, 2, 3, 1, 3, 2, 4, 5, 6, 7};

        boolean flag = true;
        List<Integer> list = new ArrayList<Integer>();

        for (int i = 0; i < a.length; i++) {
            for (int j = 0; j < b.length; j++) {
                if (a[i] == b[j]) {
                    flag = false;
                }
                if (i == j && !list.contains(b[j])) {
                    list.add(b[j]);
                }
            }
            if (flag == true) {
                list.add(a[i]);
            }
        }
        System.out.println(list);
    }
}
Community
  • 1
  • 1
0

There is a known algorithm for this. You should not use Sets or Lists and they will chew up memory and run slow for no benefit.

Here is a simple, standard and fairly optimal solution for merging sorted arrays. It assumes the incoming arrays are already sorted and contain unique integers.

public static int[] mergeSortedArrays(int[] array1, int[] array2) {
    int[] result = new int[array1.length + array2.length];
    int i = 0, j = 0, k = 0;
    while (i < array1.length) {
        while (j < array2.length && array2[j] < array1[i]) {
            result[k++] = array2[j++];
        }
        if (j < array2.length && array2[j] == array1[i]) {
            j++;
        }
        result[k++] = array1[i++];
    }
    while (j < array2.length) {
        result[k++] = array2[j++];
    }

    if (k != result.length) {
        result = Arrays.copyOf(result, k);
    }
    return result;
}
rghome
  • 8,529
  • 8
  • 43
  • 62