18

I was trying to graph the Time Complexity of ArrayList's remove(element) method. My understanding is that it should be O(N), however, its giving me a O(1). Can anyone point out what i did wrong here?? Thank you in advance.

   public static void arrayListRemoveTiming(){
    long startTime, midPointTime, stopTime;
    // Spin the computer until one second has gone by, this allows this
    // thread to stabilize;
    startTime = System.nanoTime();
    while (System.nanoTime() - startTime < 1000000000) {
    }
    long timesToLoop = 100000;
    int N;
    ArrayList<Integer> list = new ArrayList<Integer>();

    // Fill up the array with 0 to 10000
    for (N = 0; N < timesToLoop; N++)
        list.add(N);

    startTime = System.nanoTime();
    for (int i = 0; i < list.size() ; i++) {
        list.remove(i);

        midPointTime = System.nanoTime();
        // Run an Loop to capture other cost.
        for (int j = 0; j < timesToLoop; j++) {

        }
        stopTime = System.nanoTime();

        // Compute the time, subtract the cost of running the loop
        // from the cost of running the loop.

        double averageTime = ((midPointTime - startTime) - (stopTime - midPointTime))
                / timesToLoop;

        System.out.println(averageTime);
    }


}
user1526053
  • 181
  • 1
  • 1
  • 4
  • in theory, removing an object in an array is O(n) even though using random access (indexing) the remove is only O(1), whats O(n) comes from the rearranging part where the items are shift to replace that item. – Ed Morales Jun 28 '14 at 00:45
  • 1
    "subtract the cost of running the loop from the cost of running the loop"? This wins the award for the most useless comment ever :-) – paxdiablo Jun 28 '14 at 00:51
  • What is the empty loop supposed to do? – user253751 Jun 28 '14 at 02:07
  • always graph your time measurements [at Log-Log scale](https://en.wikipedia.org/wiki/Log%E2%80%93log_plot). this is an important technique for analyzing your code's performance, indeed. O(N) vs O(1) is a world of difference, and indeed points at an invalid implementation - if you measured it over a non-insignificant range of sizes and saw the graph follow this law (i.e. being straight line on the log-log plot) clearly. – Will Ness Mar 25 '18 at 06:42

3 Answers3

25

The cost of a remove is O(n) as you have to shuffle the elements to the "right" of that point "left" by one:

                 Delete D
                     |
                     V
+-----+-----+-----+-----+-----+-----+-----+
|  A  |  B  |  C  |  D  |  E  |  F  |  G  |
+-----+-----+-----+-----+-----+-----+-----+
                     <------------------
                      Move E, F, G left

If your test code is giving you O(1) then I suspect you're not measuring it properly :-)

The OpenJDK source, for example, has this:

public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index, numMoved);
    elementData[--size] = null; // Let gc do its work

    return oldValue;
}

and the System.arraycopy is the O(n) cost for this function.


In addition, I'm not sure you've thought this through very well:

for (int i = 0; i < list.size() ; i++)
    list.remove(i);

This is going to remove the following elements from the original list:

    0, 2, 4, 8

and so on, because the act of removing element 0 shifts all other elements left - the item that was originally at offset 1 will be at offset 0 when you've deleted the original offset 0, and you then move on to delete offset 1.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 2
    Ermm ... you can't measure complexity empirically. You can only estimate it. – Stephen C Jun 28 '14 at 01:12
  • 1
    @Stephen, I'm not measuring it, OP is. You're right that it'll only give you a good estimate but, if that estimate turns out to be O(1) for an algorithm that's clearly O(n), then the measurement/estimate is faulty. – paxdiablo Jun 28 '14 at 01:15
  • I thought System.arraycopy copies in block? Care to explain how/why its o(n)? – Lekkie Apr 05 '18 at 15:03
  • @Lekkie: even if it copies in blocks, it has to do a certain number of block operations which is proportional to the array size. Hence O(n). In any case, copying a block of size 99999 won't take the same amount of time as copying a block of size 1 - at the lowest level, *individual* memory locations (or small chunks of them) must be copied. – paxdiablo Apr 06 '18 at 00:58
16

First off, you are not measuring complexity in this code. What you are doing is measuring (or attempting to measure) performance. When you graph the numbers (assuming that they are correctly measured) you get a performance curve for a particular use-case over a finite range of values for your scaling variable.

That is not the same as a computational complexity measure; i.e. big O, or related Bachman-Landau notations. These are about mathematical limits as the scaling variable tends to infinity.

And this is not just a nitpick. It is quite easy to construct examples1 where performance characteristics change markedly as N gets very large.

What are doing when you graph performance over a range of values and fit a curve is to estimate the complexity.

1 - And a real example is the average complexity of various HashMap functions which switch from O(1) to O(N) (with a very small C) when N reaches 2^31. The modality is because the hash array cannot grow beyond 2^31 slots.


The second point is that that the complexity of ArrayList.remove(index) is sensitive to the value of index as well as the list length.

  • The "advertised" complexity of O(N) for the average and worst cases.

  • In the best case, the complexity is actually O(1). Really!

    This happens when you remove the last element of the list; i.e. index == list.size() - 1. That can be performed with zero copying; look at the code that @paxdiablo included in his Answer.


Now to your Question. There are a number of reasons why your code could give incorrect measurements. For example:

  • You are not taking account of JIT compilation overheads and other JVM warmup effects.

  • I can see places where the JIT compiler could potentially optimize away entire loops.

  • The way you are measuring the time is strange. Try treating this as algebra.

            ((midPoint - start) - (stop - midPoint)) / count;
    

    Now simplify ... and the midPoint term cancels out.

  • You are only removing half of the elements from the list, so you only measuring over the range 50,000 to 100,000 of your scaling variable. (And I expect you are then plotting against the scaling variable; i.e. you are plotting f(N + 5000) against N.

  • The time intervals you are measuring could be too small for the clock resolution on your machine. (Read the javadocs for nanoTime() to see what resolution it guarantees.)

I recommend that people wanting to avoid mistakes like the above should read:

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
-10

remove(int) removes the element at the ith INDEX which is O(1)

You probably want remove( Object ) which is O(N) you would need to call remove(Integer.valueOf(i))

it would be more obvious if your list didn't have the elements in order

dkatzel
  • 31,188
  • 3
  • 63
  • 67
  • 27
    `remove(int)` is not `O(1)`. It is `O(n)`. If you remove anything but the last element, you have to shift all elements after the removed element. – Sotirios Delimanolis Jun 28 '14 at 00:52
  • 1
    _Finding_ the element to remove when using an index may be O(1) but removing it is most definitely not. You still need to shuffle the elements above it. – paxdiablo Jun 28 '14 at 00:53
  • The shifting of elements down is a bulk array copy which while technically O(n) in practice is much faster since you don't have to inspect anything at those memory locations – dkatzel Jun 28 '14 at 00:55
  • 1
    Of course you have to inspect those memory locations, you have to read and write them once. However, you do so in a linear manner, which is indeed quite fast as linear time algorithms go. –  Jun 28 '14 at 00:57
  • 5
    dk, please don't make the common mistake of equating time complexity with running time. It's O(n) regardless of how fast it is because the time increases in step with n. Bubble sort can outperform other sorts quite readily with smaller data sets but that doesn't change the respective complexities. – paxdiablo Jun 28 '14 at 00:59