Find the largest element smaller than X in a sorted array with non-decreasing consecutive differences, in O( log(log (max_value)))

Question

I want to find the largest element that is smaller than x in a sorted array, where the differences between the consecutive elements of the array are non-decreasing.

For example:

[1, 3, 10, 17, 50, 1000] is a valid input , because the consecutive differences (2,7,7,33,950) are in increasing order.

[1, 3, 4, 10, 18] is not a valid input, as the consecutive differences (2, 1, 6, 8) are not in increasing order.

The optimum solution is O(log(log max_value)), so I am looking for better solutions than upperbound/ binary search. Actually ,I am looking to optimize binary search from O(log N) to O(log(log max_value).

#include <bits/stdc++.h>

using namespace std;

const int n = 50000001;
int arr[n];

int binarysearch_lowerbound(int begin, int end, int x)
{
    int ans = end;
    while (begin < end) {
        int mid = (begin + end) / 2;
        if (arr[mid] >= x) {
            ans = mid;
            end = mid;
        }
        else {
            begin = mid + 1;
        }
    }
    return ans;
}
int main()
{
    int N;
    cin >> N;
    for (int i = 0; i < N; i++) {
        cin >> arr[i];
    }
    int q;
    cin >> q;
    for (int i = 0; i < q; i++) {
        int x;
        cin >> x;
        int k;
        int begin = x / (arr[N - 1] - arr[N - 2]);
        int end = x / (arr[1] - arr[0]);
        if (end > N - 1) {
            end = N;
        }
        k = binarysearch_lowerbound(begin, end, x);
        cout << arr[k - 1] << endl;
    }
    return 0;
}

I'm a little confused by the problem statement. The largest element is simply the last element of a sorted range. Do you mean to check whether the adjacent differences are increasing? Please show some input/output examples. — cigien, Dec 29 '20 at 20:13
the largest element of the array, SMALLER than a given value X — Maria Papadopoulou, Dec 29 '20 at 20:15
@MariaPapadopoulou Right, I was too trigger-happy. What algorithm does your current implementation use? It looks like a divide (in the middle) and conquer type of search. — Ted Lyngmo, Dec 29 '20 at 20:16
I misunderstood your question. I've edited the question slightly to make it clearer in my opinion. Hope that's ok. — cigien, Dec 29 '20 at 20:19
@cigien (i am a newnie here, so any help in any direction is more than welcome, thank you...) — Maria Papadopoulou, Dec 29 '20 at 20:26
@SegiyKolesnikov My code just uses lowerbound (or upperbound) to find the element, in O(log(N)). The fact that the differences are increasing must, in some may (?) , help me to find a solution in O(log(log(max_value))).....i dont , even , know what that max_value should be.. — Maria Papadopoulou, Dec 29 '20 at 20:29
@ Ted Lyngmo Yes, it's just a lowerbound , binary search. in my code i divide X by the first, smallest, and the last, largest, difference, in my attempt to somehow restrict the bounds of the binary search. — Maria Papadopoulou, Dec 29 '20 at 20:34
The extreme case `[0,1,2,3,4,5,6,7,8,9,10]` would be a valid input since the example shows that the requirement isn't that the diff is increasing. It just isn't allowed to be decreasing, right? — Ted Lyngmo, Dec 29 '20 at 20:35
@user3386109 I did my best to give the statement of the problem right...the optimum solution is declared to be O(log(log(max_value))).... — Maria Papadopoulou, Dec 29 '20 at 20:39
A "hint" was given: suppose the consecutive differences are from: 100 to 200. the answer in this case is in O(1). — Maria Papadopoulou, Dec 29 '20 at 20:40
example: given sorted array :[1, 3, 7] X=2, X=3 X=6, X=999 The output should be: predecessor(2) = predecessor(3) = 1, predecessor(6) = 3, predecessor(999) = 7. — Maria Papadopoulou, Dec 29 '20 at 20:44
A thought is: when I find my first "mid" , do "something" with the local difference there, so i don't check the entire half array.... — Maria Papadopoulou, Dec 29 '20 at 20:48
Welcome to Stack Overflow! [You should never use #include ](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h). — rsjaffe, Dec 29 '20 at 22:41

Jerry Coffin · Answer 1 · 2020-12-29T21:00:00.367

To get O(log log N), you're probably going to need to use an interpolating search instead of a binary search.

In an interpolating search, you take the value you're looking for, and the minimum and maximum values in the array, and based on them you guess at a likely location for the element you're looking for, rather than blindly guessing the middle of the unknown portion of the array.

The simple case is when your elements are distributed approximately uniformly in the array, in which case you can use linear interpolation to guess at the point to search.

In your case, the data follows a distribution that's definitely non-linear, so you want to predict the location based on that, rather than linear interpolation. As a first approximation, let's assume it's roughly quadratic.

For a first test, let's start with a set of numbers that precisely follow a quadratic distribution: [1, 4, 9, 16, 25, 36, 49, 64]. We're going to search for the largest value less than 30. Our set has 8 values with a range of 1..64 = 63.

If we used linear interpolation, to search for 30, our initial guess would be 30/63*8 = 3 or 4 (which would actually be pretty accurate in this case).

To do a quadratic interpolation, we'd start by taking the square roots of the top and bottom of the range (8 and 1 respective), then taking the square root of 30 (~5.5) and doing linear in interpolation with those (5.5/(8-1) * 8) to get our initial guess.

From there, we continue much like we would in an binary search--if our number is larger than the value we found, we do a new interpolation among the larger numbers, and if it's smaller, we do a new interpolation among the smaller numbers.

The difficult part in all this is finding a function that actually fits your numbers reasonably well. In your case, that may be particularly problematic because you've been given only a tiny bit of information about the distribution. The requirement that the differences can't decrease means it's at least linear, but beyond that, we really don't know--it could be linear, or N^1.1, or N¹⁰, or possibly even something like N!. Linear interpolation won't fit a factorial distribution very well (nor vice versa).

I was thinking, wouldn't inspecting the first and last element give an approximate idea of the function to use for finding the next element to compare with? For each new element inspected, a new function would be formed. — Ted Lyngmo, Dec 29 '20 at 20:52
@TedLyngmo: First and last by themselves won't. About the best guess you can make from just the two points would be a linear distribution. First, middle and last will tell you quite a bit more. — Jerry Coffin, Dec 29 '20 at 20:55
Your explanation is very helpful, I didn't know about interpolating search. A hint, or subproblem, was given , where the consecutive differences are from 100 to 200. — Maria Papadopoulou, Dec 29 '20 at 20:57
Newton-Raphson surely gives `O(log log N)`, without assuming the quadratic nature of distribution. — user58697, Dec 29 '20 at 21:05
@user58697: Last I heard, Newton-Raphson required the first derivative of the function being interpolated... — Jerry Coffin, Dec 29 '20 at 22:32

Find the largest element smaller than X in a sorted array with non-decreasing consecutive differences, in O( log(log (max_value)))

1 Answers1