3

I was doing a hackerrank question that requires me to find out the number of shifts that need to occur to sort an array using Insertion Sort without actually sorting the array with insertion sort because otherwise that would be O(n^2) time-complexity! Here is my code that gets timed out. From what I know, calling the headSet method n times should come to around O(n logn).

static class MyComp implements Comparator<Integer>{
    @Override
    public int compare(Integer o1, Integer o2) {
        return o1 <= o2 ? -1: 1;
    }
}



// Complete the insertionSort function below.
static int insertionSort(int[] arr) {

    SortedSet<Integer> set = new TreeSet<>(new MyComp());
    int count=0;
    for(int i=arr.length-1; i>=0;i--){
        set.add(arr[i]);
        int pos = set.headSet(arr[i]).size();
       // System.out.println(pos);
        if(pos>0){
            count=count+pos;
        }

    }

    return count;
}

2 Answers2

5

The complexity of creating a headset is O(1)

Why?

Because a headset is not a new set. It is actually a view of an existing set. Creating one doesn't involve copying the original set, and doesn't even involve finding the "bound" element in the set.

Thus, doing it N times is O(N).

However, the reason that your code is not O(N) is that

  set.headSet(someElement).size();

is NOT O(1). The reason is that the size() method on a subset view of a TreeSet is computed by counting the elements in the view.

(AFAIK, this is not stated in the javadocs, but you can deduce it from looking at the source code for TreeSet and TreeMap.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 3
    Are you sure about the `O(1)`? I would have thought this would involve what is essentially a `get()` operation on the underlying `TreeMap` to locate the subtree - so `O(lg n)`. – Boris the Spider Jun 30 '19 at 06:49
  • My reading of the code is that it doesn't locate a subtree. Mind you, the code is rather convoluted, so I might have missed something. (And the `O(1)` is not exactly "free" either ...) – Stephen C Jun 30 '19 at 07:27
  • Yeah, now that you mention, I realise it. – Moksh Verma Jul 01 '19 at 17:08
  • @MokshVerma - I was referring to the code of `TreeSet` / `TreeMap` as being "rather convoluted", not your code. – Stephen C Jul 01 '19 at 23:58
  • Yeah, this isn't even close to right for many reasons, @StephenC. I posted the correct answer below. If we can change min to the accepted version or at least modify this one to mention that it is straight up wrong, that would be great. – SecondThread Apr 06 '22 at 21:46
  • 1
    Alright man, you literally edited your solution to completely change your answer from "It's linear" to "It's not linear" after my so-called "unreliable" benchmarks proved your original answer was completely wrong, but whatever. – SecondThread Oct 03 '22 at 06:32
  • 1
    @SecondThread - Read my edits carefully. If you do, you will see that I did NOT change what I said. Instead, I **added** material to explain why the OP was getting bad performance in spite of `headSet(...)` being an `O(1)` operation. Which it is. – Stephen C Oct 03 '22 at 06:52
  • 1
    And regarding your benchmark, you are NOT measuring the cost of `headSet(l)`. You are measuring the cost of `headSet(l).size()`. And it is the `size()` call on the head set that is `O(N)` ... worst case. Because `size()` has to iterate the entire head set to count how many nodes it *currently* contains. (It is all in the OpenJDK source code. Have you read it yet? Suggest you do.) – Stephen C Oct 03 '22 at 06:52
-1

Stephen C isn't even close and I have no idea how it has positive upvotes or is the accepted answer. Treeset access is O(log(n)) obviously, not O(1). So first off, there's no way on earth this could possibly be O(n), it's at best O(n*log(n)).

But is it? No. It's even worse. Headset is NOT A VIEW of an existing set like Stephen says, it's a new set. This is obviously the case because you can modify the headset by adding an element to it. You can't modify a view, and if that referred to the original set, it'd be a massive pain.

You can test it with the following code:

        TreeSet<Integer> test=new TreeSet<>();
        long time=System.currentTimeMillis();
        Random r=new Random(5);
        for (int i=0; i<1e6; i++)
            test.add(i);
        long ans=0;
        for (int i=0; i<1e6; i++) {
            int l=r.nextInt((int)1e6);
            ans+=test.headSet(l).size();
        }
        System.out.println(ans+" "+(System.currentTimeMillis()-time));

If it were O(n), it would run in 1/100th of a second. If it were O(log(n)), it would run in about 2 seconds. You can see this takes about 10^4 seconds. Your code is O(n^2).

SecondThread
  • 108
  • 2
  • 11
  • You need to read the question and my answer more carefully. What I actually said that `test.headSet(l)` is `O(1)`, but `test.headSet(l).size()` is `O(N)`. Note that `headSet(l)` does NOT search for `l`. It merely creates an object that will search for `l` when you perform operations on it. (If you don't believe me **read the source code**.) – Stephen C Sep 15 '22 at 04:57
  • Also ... I would note that your method of benchmarking the code is unreliable. See [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103) – Stephen C Sep 15 '22 at 04:59
  • Regarding your 2nd paragraph, can you explain this from the Java 17 [javadoc](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/TreeSet.html#headSet(E)) for `headSet`? *"Returns a **view** of the portion of this set whose elements are strictly less than toElement. The returned set is backed by this set, so changes in the returned set are reflected in this set, and vice-versa. The returned set supports all optional set operations that this set supports."* – Stephen C Oct 03 '22 at 06:57