0

Basically what I want is to I skip elements on those index values which are there in the set otherwise I should just push the old array elements into the new array.

So if my set contains [2, 4, 9, 10] I should skip the values at index 2,4,9,10 in the old Array and put the values at othe other index locations in my new Array.

I am writing code like this

  int[] newArr = new int[oldArray.length - set.size()];


   for(int i = 0, j = 0; j < newArr.length && i < oldArray.length; j++,i++){
      if(set.contains(i) )
         i++;
       else
         newArray[j] = oldArray[i];
      }

I am creating and filling my set like this

       Set<Integer> commonSet = new HashSet<>();

       for(int i = 0; i < array1; i++ ){
         for(int j= 0; j < array2; j++) {
            if(arrayOne[i] == arrayTwo[j]){
              commonSet.add(i);// Here I am saving the indices.
           }
         }
       }

Not Sure if this is the best way. Is there any other way which would be more efficient? Or I must have to resort to Collection classes like ArrayLists.

John Doe
  • 2,752
  • 5
  • 40
  • 58
  • 1
    Your code increments j regardless of whether a value was copied, and increments i twice each time you don't copy. Consider getting rid of the j++ from the for-statement, and doing it conditionally inside the loop block. Try stepping through the code by hand to see what happens. – Patricia Shanahan Apr 10 '14 at 03:10

3 Answers3

2

Using Collection classes instead of arrays would make your code much simpler.

Doing array subtraction using common libraries like apache CollectionUtils looks like this:

    Collection<Integer> diff = CollectionUtils.subtract(Arrays.asList(array1), Arrays.asList(array2));

Unless you're going to be working very large sets of data, it won't have a noticeable impact on speed.

Also, creating a set of different indexes the way you do above is going to scale very poorly for larger data sets. Just calculating the times for doing a difference using CollectionUtils.subtract() vs your set creation code shows the scaling problems (arrays filled with random Integers):

array1.length = 1000
array2.length = 1000
diff.size() = 530
elapsed ms for calculating diff = 39
set creation ms = 7

array1.length = 10000
array2.length = 10000
diff.size() = 5182
elapsed ms for calculating diff = 47
set creation ms = 519

array1.length = 50000
array2.length = 50000
diff.size() = 26140
elapsed ms for calculating diff = 101
set creation ms = 12857

array1.length = 1000000
array2.length = 1000000
diff.size() = 524142
elapsed ms for calculating diff = 1167
(didn't bother to wait for the program to finish)

As you can see, doing a double loop to compare every element scales quite poorly, and that's not even counting the subtraction you'll have to do afterwards.

EDIT updated to reflect changes in the question

azurefrog
  • 10,785
  • 7
  • 42
  • 56
  • 2
    Well, I just tried it with an array of 1 million random numbers between 1 and 1 million, and subtracted a set with 1000 random numbers between 1 and 1 million, and it took about 50 ms. So I'd say, not much. – azurefrog Apr 10 '14 at 03:45
  • 1
    +1 A One-liner, but you should fix your code, since the set is not a set of values to be skipped, but a set of indices – Khaled.K Apr 10 '14 at 05:58
1

If you're worried about performance, definitely do not use any list or collection classes. They are notorious for re-allocating arrays frequently as they need more capacity, which is a very slow operation.

Unfortunately, I don't know how you create/fill the set of indices. If it is possible for you to have your set in an array as well and generate it in such a way that its entries are sorted, you can optimize your code significantly.

If set is fairly long compared to oldArray, do this (this assumes no duplicate entries in set!):

int l = oldArray.length; // Cache length (some compilers might do this for you)
for (int i=0, j=0, k=0; i<l; i++) {
    if (set[k]==i) {
        k++;
    } else {
        newArr[j++] = oldArray[i];
    }
}

If set is fairly short, do this (this can handle duplicate entries, but set still needs to be sorted):

int o1=0;
int o2=0;
for (int p:set) {
    System.arraycopy(oldArray, o1, newArr, o2, p-o1);
    o1+=p+1;
    o2+=p;
}
System.arraycopy(oldArray, o1, newArray, o2, oldArray.length-o1);

The former avoids function calls and the latter banks on the optimized memory-copy implementation of System.arraycopy(...) (and set can be any sorted Iterable, although an array will be faster).

Which one is faster will depend on the exact sizes of your arrays and which system (CPU, JVM) you use.

If set is not sorted, you can either use your approach (debugged, of course) or you can sort it first and then use one of the approaches here. Again, which one will give you better performance will depend on the size of set and your system.

Markus A.
  • 12,349
  • 8
  • 52
  • 116
  • set is shorter in length in comparison to the oldArray. – John Doe Apr 10 '14 at 04:08
  • 1
    @user1590011 Right... sorry I was being unclear. Of course it needs to be shorter. By "fairly long" I meant roughly half the size or bigger. :) – Markus A. Apr 10 '14 at 04:18
  • Thanks Markus your answer is good. Now since I have added to my question the way I am creating the set and filling it. What I would like to know more about is that can I use the same methods that you have outlined above? Do you sugggest me to make it a sorted set? Also please take a look at the way I am doing it right now by the answer I have given below. – John Doe Apr 10 '14 at 04:22
  • 1
    :) Perfect! You are already creating the set sorted and with no duplicate entries! So, simply use an ArrayList instead of a HashSet for commonSet and optionally convert it to an array afterwards using something like `Integer[] set = commonSet.toArray(new Integer[commonSet.size()])`. Then you can use either approach here and it should be pretty fast. – Markus A. Apr 10 '14 at 04:26
  • 1
    Alternatively, you can create an int[] for `set` immediately and just make it as long as oldArray (worst case scenario). Then, you can use version 1 above directly, but for version 2, you would have to use a for-loop over an index variable that goes from 0 to some kind of last-used-entry-index in `set`. – Markus A. Apr 10 '14 at 04:31
  • 1
    But you can do even better for the whole diff-operation that you are doing: Take a look at AJed's answer here: http://stackoverflow.com/questions/5283047/intersection-and-union-of-arraylists-in-java (third from the bottom) – Markus A. Apr 10 '14 at 04:33
  • So I should convert the input array to ArrayList and then pass the ArrayList to the function outlined by AJed. I mean I can't escape the ArrayList in this scenario :) – John Doe Apr 10 '14 at 04:37
  • 1
    You can adapt AJed's answer to work on plain old arrays rather than ArrayLists. Note that his algorithm requires your two arrays (array1 and array2) to be sorted, though! Also, his algorithm gives you the intersection rather than the difference of the two arrays! It was more meant as an example of another approach entirely. – Markus A. Apr 10 '14 at 04:40
0

This piece of code is doing it for me. Thanks @ Patricia Shanahan

         int j = 0, i = 0;
         while( j < newArr.length && i < oldArray.length){

           if(commonSet.contains(i)){
                i++;
               }
           else{
               diffArray[j] = arrayOne[i];
               j++;
               i++;
             }
          }
John Doe
  • 2,752
  • 5
  • 40
  • 58
  • 2
    Note that `commonSet.contains(...)` is a fairly involved operation (including calculating hash values or doing up to an average of set.size()/2 comparisons) and you are calling it once for each element in oldArray. :) – Markus A. Apr 10 '14 at 04:22
  • Should I forego this commonSet altogether and use some other data structure – John Doe Apr 10 '14 at 04:26
  • 1
    (see my comment above) :) – Markus A. Apr 10 '14 at 04:27