3

I had 2 integer arrays, one original and one modified w.r.t the original array. Elements can be added or removed from the original to convert it to the modified. My problem is, in the modified, I need to find out which elements are new, which elements are same, and which elements are not there w.r.t to the original array.

Given data:

arr1 = 3,2,1      //Original array
arr2 = 1,4,5      //Modified array by adding and/or removing elements

I need something like:

same = 1
removed = 2,3
added = 4,5

Obviously, I can write several nested for loops and find it out, but this would be too inefficient. I was wondering if there was be a better or efficient way to do it.. I am using Java. This page kind of address a similar problem, but not sure if I can use it to solve my problem.

Any help would be appreciated.

Community
  • 1
  • 1
rgamber
  • 5,749
  • 10
  • 55
  • 99

4 Answers4

6

If memory is not a constraint, I'd recommend using Set for these kind of operations. Finding the things you require would be just a matter of calling the appropriate methods on the two Set objects. Of course, this assumes you'd have unique elements in your elements and if not, you are interested in only unique elements when it comes to reporting stuff. For e.g.

public static void testSet() {
    final Set<Integer> first = new HashSet<Integer>(Arrays.asList(1, 2, 3));
    final Set<Integer> second = new HashSet<Integer>(Arrays.asList(1, 4, 5));

    Set<Integer> result = new HashSet<Integer>(first);
    result.retainAll(second);
    System.out.println("Similar: " + result);

    result = new HashSet<Integer>(first);
    result.removeAll(second);
    System.out.println("Removed: " + result);

    result = new HashSet<Integer>(second);
    result.removeAll(first);
    System.out.println("Newly added: " + result);
}
/*
OUTPUT:

Similar: [1]
Removed: [2, 3]
Newly added: [4, 5]
*/
Sanjay T. Sharma
  • 22,857
  • 4
  • 59
  • 71
2

You could go through each array once and obviously use a removal save iteration like a loop from the back of the array.

int[] same
for (int i = arr1.length; i >= 0; i--)
{
    if(arr2.contains(i))
        same.add(i)
        arr1.remove(i)
}
for (int i = arr2.length; i >= 0; i--)
{
    if(same.contains(i))
        arr2.remove(i)
}

Then arr1 will be the list of removed, arr2 will be added and same will be the same.

J Lundberg
  • 2,313
  • 2
  • 20
  • 23
  • This is a simple solution. Thanks, I am also taking a look at the dynamic programming solution below. – rgamber Aug 19 '11 at 18:06
  • 1
    Yes, this is the simple solution in terms of "space complexity". The `contains` check here is `O(n)` just FYI which is exactly the `Set` solution tries to deal with. – Sanjay T. Sharma Aug 19 '11 at 18:09
  • @Sanjay Agreed, its not the best but its simple. – J Lundberg Aug 19 '11 at 18:13
  • 1
    Sorry if this seems like nitpick, but how exactly is this simple? This code has around 12 lines just for finding out same and other stuff. In around 16 lines I have all the 3 things with sample code! Also, it's really OK to go with this solution, IMHO the `Set` solution is something the OP must keep in mind when dealing with "large" collections when memory is not a constraint. Of course, YMMV :) – Sanjay T. Sharma Aug 19 '11 at 18:16
2

If there are no duplicates, and the maximum integer is constrained, and the members are moderately dense (say, 1% density or better), make them into BitSets. The "and" of the two sets are "same", A.andNot(B) are the ones in A only, and B.andNot(A) are the ones in B only. If the integers are moderately dense, this is very fast.

If the integers are sparse, sort each array and walk up them in tandem.

Ed Staub
  • 15,480
  • 3
  • 61
  • 91
1

You are trying to calculate the Levenshtein distance between two arrays.

There is a simple dynamic programming solution to calculate this (taken from Wikipedia):

int LevenshteinDistance(char s[1..m], char t[1..n])
{
  // for all i and j, d[i,j] will hold the Levenshtein distance between
  // the first i characters of s and the first j characters of t;
  // note that d has (m+1)x(n+1) values
  declare int d[0..m, 0..n]

  for i from 0 to m
    d[i, 0] := i // the distance of any first string to an empty second string
  for j from 0 to n
    d[0, j] := j // the distance of any second string to an empty first string

  for j from 1 to n
  {
    for i from 1 to m
    {
      if s[i] = t[j] then  
        d[i, j] := d[i-1, j-1]       // no operation required
      else
        d[i, j] := minimum
                   (
                     d[i-1, j] + 1,  // a deletion
                     d[i, j-1] + 1,  // an insertion
                     d[i-1, j-1] + 1 // a substitution
                   )
    }
  }

  return d[m,n]
}

You can easily change this code to tell you what the individual operations are.

tskuzzy
  • 35,812
  • 14
  • 73
  • 140