3

Working on an algorithm puzzle. Post problem statement and code. My question is, for the last line, whether return citations[right] is always has the same result as return len - (right+1)? I tried few test cases, and it seems both are of the same value. Want to seek advice is any anti-samples when they are different? Thanks.

Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.

According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."

For example, given citations = [3, 0, 6, 1, 5], which means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, his h-index is 3.

What if the citations array is sorted in ascending order? Could you optimize your algorithm?

class Solution {
public:
    int hIndex(vector<int>& citations) {
        int left=0, len = citations.size(), right= len-1,  mid;
        while(left<=right)
        {
            mid=(left+right)>>1;
            if(citations[mid]== (len-mid)) return citations[mid];
            else if(citations[mid] > (len-mid)) right = mid - 1;
            else left = mid + 1;
        }
        return len - (right+1);
    }
};

thanks in advance, Lin

Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • 4
    Blatant cut-paste of homework problem. – FractalSpace Nov 16 '15 at 03:59
  • 1
    @FractalSpace, not a homework. Trust me. :) – Lin Ma Nov 16 '15 at 04:00
  • 1
    ... or probably a (very popular) interview question. A solution is mentioned here: https://github.com/haoel/leetcode/blob/master/algorithms/cpp/h-Index/h-Index.cpp – FractalSpace Nov 16 '15 at 04:05
  • 1
    @FractalSpace, I read similar implementation before and I have my own thoughts to implement in another way (for the return statement) and I did some testing and the purpose is to ask expert here to see if my implementation is also correct. Your advice is appreciated. :) – Lin Ma Nov 16 '15 at 04:18

1 Answers1

3

First of all, your implementation works on sorted input only anyway, right?

Now imagine the input:

vector<int> v{1,2,5,6,9};

This input will return different values for:

return len - (right+1);    // returns 3 (correct answer)
return citations[right];   // returns 2 (wrong answer)

You can however do:

return len-left; 

This works beacuse right+1 will always equal left on this line (given your code).

Think about the exit condition of the while loop as well as the fact, that the difference between left and right can only change by 1 at max, on each iteartion.


In overall the best solution is to sort the input first and then do your binary search, giving you O(N log N) time complexity, all together.

It won`t get any better than that.

Sidenote: I would avoid code like >>1 instead of simply dividing by 2 since this impairs readability, without any benefits. I assume you are using a reasonable compiler (with optimization capablities).

oo_miguel
  • 2,374
  • 18
  • 30
  • Thanks oo_miguel, yes, it works on sorted only. I think in the input you posted, the return value should be 2 other than 3, since no paper has 3 citations and we should choose a real existing paper? There is no paper has citation 3. Any comments are appreciated. :) – Lin Ma Nov 20 '15 at 07:37
  • 1
    by the defintion of the *h-index* it should return `3`, since there are `3` papers that satisfy: `3` **or more** citations. (while `N-3` papers do not satisfy this condition) – oo_miguel Nov 20 '15 at 07:39
  • 1
    Thanks oo_miguel, do you think returning len-left always have the same value of returning len - (right+1)? Thanks. – Lin Ma Nov 20 '15 at 07:45
  • Thanks oo_miguel, did some debugging and want to confirm my understanding is correct, when the loop breaks (because of left > right, not because of we find a mid which exactly the value of len - right), the element pointed by left is the only possible elements which satisfies the condition of h-index, so we should return len - left, in other words, no other elements larger than the element pointed by left, which satisfies the situation? – Lin Ma Nov 20 '15 at 08:37
  • 1
    @LinMa Yes, when the loop breaks `left` will be the first paper with **h or more** citations, while `right` will be the last paper with **less** citations. – oo_miguel Nov 20 '15 at 08:49
  • 1
    Thanks oo_miguel, mark your reply as answered. For the value of bounty, need to wait until tomorrow to grant to you because of restrictions. Thanks again for the help and smart analysis. Please remind me if I forget to grant bounty reward tomorrow. :) – Lin Ma Nov 20 '15 at 08:56
  • 1
    @LinMa no problem, I'm glad I could help :) – oo_miguel Nov 20 '15 at 08:57
  • 1
    I agree about writing `/2` instead of `>>2`. It's more to the point, and more correct since `>>2` assumes two's complement, which is not strictly guaranteed. There is a difference in rounding though for `signed` values. A bit shift (when correct) rounds down, where as `/2` typically rounds towards 0 (down for positive, up for negative). Hence, to convert division to a bit shift for a `signed` type, the compiler might also insert code to correct the rounding behaviour. You can avoid this by declaring the variables `unsigned`. It makes sense here, they are array indexes. – Daniel Stevens Nov 20 '15 at 16:18
  • @DanielStevens, how "two's complement" is related here? BTW, overall very good catch for signed/unsigned. – Lin Ma Nov 20 '15 at 20:08
  • @DanielStevens, tried it seems / round down? I tried print -3 / 2, which output -2 other than -1, but print -3 >> 2 output -1 other than -2. I am using Python 2.7.x. – Lin Ma Nov 20 '15 at 20:11
  • @oo_miguel, bounty granted, and thanks for all the help again. Have a good weekend. :) – Lin Ma Nov 21 '15 at 00:13
  • 1
    Technically the rounding behaviour in C++ of negative numbers is undefined, but the typical, and suggested behaviour, is to round towards 0. Different languages will have their own standard behaviour. In Python 2, the behaviour is to use floor division (round down). This changed in Python 3, where `-3/2` will not produce an int, but `-3//2` will do the previous floor division. – Daniel Stevens Nov 21 '15 at 10:21
  • 1
    @LinMa Perhaps I was being too pedantic by stating "two's complement". There are many binary representations of numbers. For instance, IEEE 754 Floating Point can not be divided by 2 by simply shifting the bits. A two's complement signed number can be, where it produces floor division (round down). A one's complement signed number also works, and produces round to 0 behaviour. As a more esoteric example, [Gray Code](https://en.wikipedia.org/wiki/Gray_code) also seems to work, but I've only ever seen examples of it in the unsigned case. – Daniel Stevens Nov 21 '15 at 10:31
  • 1
    Also just noticed my first comment used `>>2`, which should have been `>>1`. See the potential for bugs! :p – Daniel Stevens Nov 21 '15 at 10:32
  • @DanielStevens, thanks for the comments, and wondering what exactly means two's complement and one's complement? A bit more explain to newbie is appreciated. :) – Lin Ma Nov 24 '15 at 09:27
  • 1
    @LinMa I suppose see [What is 2's complement?](http://stackoverflow.com/questions/1049722/what-is-2s-complement) Also of interest might be [Arithmetic Shift](https://en.wikipedia.org/wiki/Arithmetic_shift), particularly the section titled "Non-equivalence of arithmetic right shift and division" – Daniel Stevens Nov 24 '15 at 14:11
  • @DanielStevens, very insightful comments and take time to read and enjoy it. One more confusion is I am confused why round down for >> negative number is not expected? I tested both -3 >> 1 and -3 // 2 have the same result, which is -2 (round down). And wondering why round down is not correct? Thanks. – Lin Ma Nov 25 '15 at 22:56
  • 1
    @Lin Ma, sounds like you may have whole new questions buried in these comments. Rounding behaviour is more of a convention than something that is right or wrong. If the convention for division in some environment is round to 0, while shifting right is to round down, then they're not equivalent operations. Python and most C++ compilers seem to have settled on a different convention. – Daniel Stevens Nov 28 '15 at 09:03
  • @DanielStevens, you mean your test is in C++, which shows >>2 behaves differently with positive and negative numbers? Thanks. – Lin Ma Nov 30 '15 at 23:03