Any O(n) algorithm for smoosh method?

Question

I am trying to finish a smoosh() method which will takes an array of ints. On completion the array should still contains the same numbers but wherever the array had two or more consecutive duplicate numbers, they are replaced by one copy of the number. Hence, after smoosh() is done, no two consecutive numbers in the array are the same.

Any unused elements at the end of the array are set to -1.

For example, if the input array is

[ 1 1 0 0 4 4 5 0 0 0 7 ]

it reads

[ 1 0 4 5 0 7 ]

after smoosh() completes.

The method signature is: public static void smoosh(int[] ints)

I was able to do it like this:

for (int i=0; i<ints.length-1; i++) {
    if (ints[i+1]==ints[i])
        ints[i]=-1;
}
for (int i=0; i<ints.length-1; i++) {
    if (ints[i]==-1) {
        for (int j=i+1; j<ints.length; j++) {
            if (ints[j]!=-1) {
                //swap ints[j] and ints[i] and then break;
            }
        }
    }
}

However, this will be O(n2) time (although almost in place).

I feel like there should be some O(n) in place method to do this but I can't figure out how. Could anyone think of any O(n) in place algorithm? (Obviously if you make another array of the same size to help processing then you can get O(n) easily but that's not what I am looking for since that's not in place...)

thanks!

there is an O(n) in-place algorithm, but no time to write if for you. hint - you need to keep track of two different indexes.... — mikera, Oct 01 '12 at 03:54

paxdiablo · Answer 1 · 2012-10-01T04:31:59.493

Basically, as follows. This O(n)-time, O(1)-space "algorithm" is actually Python code, since that's a very good language for teaching basic algorithms, as long as you avoid all the complex stuff, like lambdas.

I'm actually using it to teach my 8yo son at the moment as he's expressed an interest in what I do all day at work.

array = [1, 1, 0, 0, 4, 4, 5, 0, 0, 0, 7]

print array

count = len (array)
last = array[0] - 1
toidx = 0
for fromidx in range (0, count):
    if array[fromidx] != last:
        array[toidx] = array[fromidx]
        toidx = toidx + 1
        last = array[fromidx]
while toidx < count:
    array[toidx] = -1
    toidx = toidx + 1

print array

The output of this is:

[1, 1, 0, 0, 4, 4, 5, 0, 0, 0, 7]
[1, 0, 4, 5, 0, 7, -1, -1, -1, -1, -1]

as your specification ask for.

It basically runs two indexes through the array, the fromix index advances by one no matter what. The toidx index only advances if the value at fromidx is different to the last one transferred. The initial value of the last one transferred is set to something different to the first element, so as to ensure the first element is transferred.

In other words, on each iteration where that condition is true, the value at the from index is copied to the toidx index, the toidx index is incremented, and the last value is updated. If the value at fromidx is the same as the last transferred, the toidx index is not updated.

Then, at the end, all the remaining values are set to -1.

Since your specs call for the rest of the array to be populated with -1, that's what I've done in the above code.

However, your sample result does not contain the negative values so, on the off chance that you need the array truncated rather than filled with -1, you basically replace the while loop at the end with an array truncation, so that its size is now toidx.

In Python, you could do that with something like:

array = array[0:toidx]

+1 - for using python. The OP will learn more by doing the translation to Java. — Stephen C, Oct 01 '12 at 04:04
+1 - thanks for the great explanation! So basically you use a separate index to do the move while going through the comparison. Very neat idea! — Eric H., Oct 01 '12 at 13:12

score 3 · Answer 2 · answered Oct 01 '12 at 03:54

There's no need for your inner loop. You just have to keep track of what the last value you visited was, then start skipping until you find a 'new' number. e.g. in pseudo code

previous = null;
newarray = array();
newpos = 0;
for (i = 0; i < oldarray.length; i++) {
   if (oldarray[i] == previous) {
      continue; // got a duplicate value, so skip it.
   } else {
      newarray[newpos++] = oldarray[i];
      previous = oldarray[i];
   }
}
for (i = newpos; i < oldarray.length; i++) {
   newarray[i] = -1; // fill in any empty slots
}

Now you're down to O(n).

jonvuri · Accepted Answer · 2012-10-01T22:54:41.887

1

If you use a LinkedList instead, you could use a ListIterator for the loop, storing the value of the previous value in the list and calling ListIterator.remove if it equals the current value.

edited Oct 01 '12 at 22:54

answered Oct 01 '12 at 03:56

jonvuri

5,738
3
24
32

Not entirely sure I earned this one, compared to the other answers... :P But in my own defense, I prefer clear, pragmatic solutions that Just Work. (Not to say there's no value in learning "close to the metal" algorithms) – jonvuri Oct 01 '12 at 13:27
@user1660652, this is, by the way, the problem with using libraries without understanding their implementation. `ArrayList.remove` is an O(n) operation itself since it shifts around the array elements above that point. By doing that within an iterator, this solution is O(n^2), _not_ O(n). Now Kiyura is correct that sometimes it's better to use the facilities given, but you do need to _understand_ the implications. See http://stackoverflow.com/questions/559839/big-o-summary-for-java-collections-framework-implementations – paxdiablo Oct 01 '12 at 21:21
@paxdiablo, you would certainly be correct if I had referred to `ArrayList.remove`. I did not. My solution is O(n). – jonvuri Oct 01 '12 at 22:45
Actually, I'm not certain if `ListIterator.remove` is O(1) w.r.t. ArrayList; it may require a LinkedList instance to be constant time. That part of it slipped my mind. Sorry! I've edited my answer to reflect this. – jonvuri Oct 01 '12 at 22:54
Kiyura, `ArrayList.remove()` is definitely O(n), it has an arraycopy in it (in the openjdk source, anyway) to shuffle down the rest of the elements. LinkedList is a better choice for `remove` although you pay the price by changing `get` from O(1) to O(n). Of course, all this matters naught unless your lists are substantial in size, or used a _lot_ :-) Unless that's the case, I'd stick with the Java collections as you have. Only if you identify this as a specific performance problem would I resort to rolling my own. – paxdiablo Oct 01 '12 at 23:47
@paxdiablo, You missed it again; the method in question is `ListIterator.remove`, not `ArrayList.remove`, unless that maps directly to the `AbstractList`'s remove method? (Would surprise me.) I searched for a while but I couldn't find a source for running time of `ListIterator.remove` on ArrayList in practical implementations (e.g. Sun VM). It seems like it *could* be implemented (or "cheated") in O(1); just start shifting all following elements in the iterator, right? It would be downright silly if it wasn't O(1) for a LinkedList, obviously. – jonvuri Oct 02 '12 at 02:42
If you look at `AbstractList.java` in the OpenJDK (the iterator code is in there along with the list code itself), that's exactly how it does it. The iterator remove simply does some pre-checks on the iterator itself (co-modification detection and ensuring you _have_ a current index in your iterator), then calls the ArrayList remove method. – paxdiablo Oct 02 '12 at 03:20
However, I think we've probably discussed it enough, comments weren't meant for long discourses :-) – paxdiablo Oct 02 '12 at 03:23

Any O(n) algorithm for smoosh method?

3 Answers3