17

Given an positive integer x and a sorted positive integer array A

Is there any algorithm faster than O(N) to determine if any element in A is a multiple of x? There are no negative elements in A.

Naive looping A once is my only idea so far, I do not know if there is any way to make use of the fact that A is sorted to speed it up.

Rory Daulton
  • 21,934
  • 6
  • 42
  • 50
shole
  • 4,046
  • 2
  • 29
  • 69
  • 1
    Hum. Interesting. I guess if `x` is large then you can sort of "binary search" from the current point to get the next candidate multiple. – Bathsheba Jul 21 '17 at 08:17
  • 5
    No, there isn't. Looking at one value in the array `A` will *in general* not give you any clue on whether there is such a multiple at the left side or the right side of it. It could be anywhere. Even when you have looked at all but one value in the array, you still will not know whether the last remaining value is or is not a multiple of `x`. So it will be *O(n)*. – trincot Jul 21 '17 at 08:19
  • do you wish to know all the elements of `A` that are multiples of `x` or do you just want a boolean as output? – Chris Jul 21 '17 at 08:21
  • 5
    I think this very much depends on the size of `x` and the length of `A`. If `x` and `A` are big, it might pay off to binary-search for all multiples of `x` between the first and last element of `A`. – tobias_k Jul 21 '17 at 08:24
  • Which possible values of `x` are you interested in? – Bathsheba Jul 21 '17 at 08:25
  • 6
    @tobias_k, for that to be efficient you would also need that the two extreme values in A are not far apart. If A has a 10000 values ranging from 1 to 100000000x+1, it will still be better to scan all values in A. – trincot Jul 21 '17 at 08:27
  • 2
    I guess you need linear time. Consider this special case, given a sorted positive int array, does it contain a even number(multiple of 2)? I don't think you can find the ans without looking at every element. – Petar Petrovic Jul 21 '17 at 08:27
  • Binary-searching `A` will take log(N) (N being the number of elements in `A`), so you can ask yourself: Is the number `k` of possible multiples of `x` within `A` (judging from the first and last element) small enough such that `k*log(N) < N` – tobias_k Jul 21 '17 at 08:28
  • @Chris Only yes, no answer. @Bathsheba I agree your thought, if `x` is large enough actually it is worth the binary search as it "jump" across many integers (possibly), yet in my scenario, the length of A is large (which is why I want to avoid O(N)) but x is like ~1000 – shole Jul 21 '17 at 08:29
  • @PetarPetrovic indeed..this special case is my original simplified question instead of general `x`, but I afraid it has some kind of special method to duel with it, so I just ask the full question as a whole... – shole Jul 21 '17 at 08:31
  • @trincot I don't think the difference between the first and last element is important per se, but the number of possible multiples of `x` within those bounds certainly is. See my other comment. – tobias_k Jul 21 '17 at 08:31
  • 3
    you can get constant-factor speedups with SIMD/multithreading if the array is large. for specific input patterns you can get down to constant time. but I don't think there's something faster than O(n) in the general case. also, can it contain duplicates? – the8472 Jul 21 '17 at 08:31
  • Do you have just one query with the same array? What is the distribution of numbers in A, what is the nature of them? – DAle Jul 21 '17 at 08:42
  • You can think of this as an intersection of a sorted array and a finite arithmetic sequence of known length. Using binary search you can obtain two subproblems. Maybe amortized O(log(N)) for the average case? Just a hunch. – hilberts_drinking_problem Jul 21 '17 at 08:54
  • @DAle the8472, yes it can have duplicates, indeed not many other constrains on the construction of A, that's why I have to consider the general (worst case) algorithm – shole Jul 21 '17 at 08:58
  • Is the list *sorted* or even *consecutive*? If just "sorted": try and find a multiple of 5 in 1,2,3,4,6,7,8,9,11,12,13,14,16, ... You need to check every number. – Hans Kesting Jul 21 '17 at 09:59
  • Can someone tell me what would be wrong with the following approach? https://pastebin.com/8hC7jZ1P – Koekje Jul 21 '17 at 13:54

6 Answers6

14

This seems to depend very much on the size of x and the number of elements within A, and particularly the number of candidate multiples of x within A.

Binary-searching a specific number within A takes O(log(n)) time (n being the number of elements within A), so if there are k possible multiples of x between the first and the last element of A, it will take O(k * log(N)) to check them all. If that number is smaller than n, you can use this algorithm, otherwise just do a linear search.

(Also, there are probably a few small optimizations to above algorithm. E.g., once you checked x*i (and did not find it), you can use the position where x*i should have been as the lower bound when searching for x*(i+1) instead of the very first element of the array.)

tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • 3
    You might also make this a recursive algorithm, searching for all the multiples of x at once, in each turn splitting them up on the lower and upper half and each time again checking whether k*logn < n still holds or not. – tobias_k Jul 21 '17 at 08:52
14

HT @ tobias_k's comment.

You can solve it in ~ O(n/x) (update this might actually be O(N*log(N)/X^2)). You do a binary search for all multiples of x simultaneously. Where you subdivide each search space each iteration and when the search space cannot contain a multiple of x, you abort that branch. So rather than binary search each value, you binary search all values but only for those branches that still contain a valid multiple within their range. The best thing about this is it utterly prevents researching the same space twice which makes the worst case x=1 or O(n/1) O(n). In the best case it will know that the range cannot contain a multiple and abort in O(1).

Since you are guaranteed a worse case of O(n) wherein you basically miss every damned cache lookup (keep in mind in real world terms this might end up being more important than the time complexity, hence test such things). You would get theoretical time complexity that could be better than O(n) but never worse than that (save the jumping around an array will miss caches because that's how computers end up actually working in the real world thing).

As predicted the speed increase depends a lot on the value of k (x).

This starts going faster than the raw loop at k = ~128. (divide factor)

Truncating the branches manages to real world surpass raw loop. I'm assuming the n count isn't going to matter much as it seemed to scale about the same, but perhaps directly checking that is better.

Note: by the nature of this code it will skip doubles, which is the difference in the counts.

public class MultipleSearch {

    public static void main(String[] args) {
        Random random = new Random();
        int[] array = new int[500000000];
        for (int i = 0, m = array.length; i < m; i++) {
            array[i] = Math.abs(random.nextInt());
        }
        Arrays.sort(array);
        for (int k = 1; k < 16777216; k *= 2) {
            long time;
            time = System.currentTimeMillis();
            binaryFactorLocator(array, k);
            System.out.println("Factors found multi: " + (System.currentTimeMillis() - time) + " @" + k);
            time = System.currentTimeMillis();
            loopFactorLocator(array, k);
            System.out.println("Factors found loop: " + (System.currentTimeMillis() - time) + " @" + k);
        }
    }

    public static void loopFactorLocator(int[] array, int k) {
        int count = 0;
        for (int i = 0, m = array.length; i < m; i++) {
            if (array[i] % k == 0) {
                count++;
                //System.out.println("loop: " + array[i] + " value at index " + i + " is a proven multiple of " + k);
            }
        }
        System.out.println(count + " items found.");
    }

    public static void binaryFactorLocator(int[] array, int k) {
        int count = binaryFactorLocator(0, array, k, 0, array.length);
        System.out.println(count + " items found.");
    }

    public static int binaryFactorLocator(int count, int[] array, int k, int start, int end) {
        if (start >= end) { //contains zero elements. (End is exclusive)
            return count;
        }
        int startValue = array[start]; //first value
        int endValue = array[end - 1]; //last value;
        if (startValue / k == endValue / k) { //if both values are contained within the same factor loop.
            if (startValue % k == 0) { //check lower value for being perfect factor.
                //System.out.println("multi-binary: " + startValue + " value at index " + start + " is a proven multiple of " + k);
                return count + 1;
            }
            return count; //There can be no other factors within this branch.
        }
        int midpoint = (start + end) / 2; //subdivide
        count = binaryFactorLocator(count, array, k, start, midpoint); //recurse.
        count = binaryFactorLocator(count, array, k, midpoint, end); //recurse.
        return count;
    }
}

This implementation should be pretty solid, since it truncates the loop within the start/k == end/k elements it should skip double (sometimes, it might cleave between two doubled values). Obviously recursive like this is likely not going to be optimal and should be perhaps rewritten with a less callstack stack.

474682772 items found.
Factors found multi: 21368 @1
500000000 items found.
Factors found loop: 5653 @1
236879556 items found.
Factors found multi: 21573 @2
250000111 items found.
Factors found loop: 7782 @2
118113043 items found.
Factors found multi: 19785 @4
125000120 items found.
Factors found loop: 5445 @4
58890737 items found.
Factors found multi: 16539 @8
62500081 items found.
Factors found loop: 5277 @8
29399912 items found.
Factors found multi: 12812 @16
31250060 items found.
Factors found loop: 5117 @16
14695209 items found.
Factors found multi: 8799 @32
15625029 items found.
Factors found loop: 4935 @32
7347206 items found.
Factors found multi: 5886 @64
7812362 items found.
Factors found loop: 4815 @64
3673884 items found.
Factors found multi: 3441 @128
3906093 items found.
Factors found loop: 4479 @128
1836857 items found.
Factors found multi: 2100 @256
1953038 items found.
Factors found loop: 4592 @256
918444 items found.
Factors found multi: 1335 @512
976522 items found.
Factors found loop: 4361 @512
459141 items found.
Factors found multi: 959 @1024
488190 items found.
Factors found loop: 4447 @1024
229495 items found.
Factors found multi: 531 @2048
243961 items found.
Factors found loop: 4114 @2048
114715 items found.
Factors found multi: 295 @4096
121964 items found.
Factors found loop: 3894 @4096
57341 items found.
Factors found multi: 195 @8192
61023 items found.
Factors found loop: 4061 @8192
28554 items found.
Factors found multi: 106 @16384
30380 items found.
Factors found loop: 3757 @16384
14282 items found.
Factors found multi: 65 @32768
15207 items found.
Factors found loop: 3597 @32768
7131 items found.
Factors found multi: 35 @65536
7575 items found.
Factors found loop: 3288 @65536
3678 items found.
Factors found multi: 17 @131072
3883 items found.
Factors found loop: 3281 @131072
1796 items found.
Factors found multi: 13 @262144
1900 items found.
Factors found loop: 3243 @262144
873 items found.
Factors found multi: 6 @524288
921 items found.
Factors found loop: 2970 @524288
430 items found.
Factors found multi: 3 @1048576
456 items found.
Factors found loop: 2871 @1048576
227 items found.
Factors found multi: 2 @2097152
238 items found.
Factors found loop: 2748 @2097152
114 items found.
Factors found multi: 1 @4194304
120 items found.
Factors found loop: 2598 @4194304
48 items found.
Factors found multi: 0 @8388608
51 items found.
Factors found loop: 2368 @8388608
phuclv
  • 37,963
  • 15
  • 156
  • 475
Tatarize
  • 10,238
  • 4
  • 58
  • 64
  • 4
    The complexity of this algorithm is not `O(n/x)`. It does not depend on the value of `x` in general. It's `O(n)` in the worst-case. But that's not that important because it can be much faster on real data than for loop. – DAle Jul 21 '17 at 11:07
  • Can u explaim more or demo with a simple example? I am not quite undetstand how it works, what i am getting is you try to reduce the potential range for consecutive binary search of mutiples of x? – shole Jul 21 '17 at 12:12
  • I suspect that the performance of this algorithm compared to a naive crawl will be highly dependent on the structure of the dataset, but with the right dataset it should be highly effective. – Jack Aidley Jul 21 '17 at 12:19
  • This sounds interesting but I must admit I have problems visualizing how you would actually proceed. For example, how to you "subdivide" the search space: 2 parts? x parts? ... Could you give a sketch on the algorithm in pseudo-code? – Matthieu M. Jul 21 '17 at 12:32
  • 1
    @shole, this answer has the same idea as Zen's answer, if I understood it correctly. – DAle Jul 21 '17 at 12:36
  • @shole Gave javasource code. You can see at certain values of k starting at 128 this algorithm starts going faster. There's likely ways to kill some of the overhead better too. – Tatarize Jul 21 '17 at 21:06
  • @DAle, I'm fine supposing it is not O(n/x) clearly there's a log(n) in the of the division overhead, that isn't included in my complexity that must be executed. But, it absolutely does truncate the number of operations depending on the size of x. In fact, since the larger the X is the more it can truncate I might be under-estimating the time complexity. Since in actual operations it basically quickly drops ranges that fall outside the given range. And just drills down on the sublists where there could be an answer. It might be O(N*log(N)/X^2) – Tatarize Jul 21 '17 at 23:35
  • @DAle yes, we arrived at the same idea from different angles – Zen Jul 22 '17 at 12:06
  • @DAle I agree that worst-case complexity is still O(n). I’m pretty sure that average complexity is better than that. To prove that, I’d have to formally model a distribution of As and Xs, and work out the maths, which is starting to look like a minor computer science project. – Zen Jul 22 '17 at 12:10
  • Actually, in mine there, the worst case is actually O(n*log(n)) I can't escape that the dividing in half over and over again is going to take log(n).That's clearly what happens when X = 1 for O(N*log(N)/X^2) reduces to O(N*log(N)). – Tatarize Jul 23 '17 at 03:50
  • 1
    @Tatarize, you have approximately `1 + 2 + 4 + .. + n` calls of your `binaryFactorLocator` in the worst-case that is less than `2*n`. There is no `log(n)` multiplier. And there is no `/X^2` part because for every `X` we could build an array of `ai = x*i+1`, that will lead to `O(n)` operations. – DAle Jul 23 '17 at 20:20
  • 1
    As a side comment, try using `System.nanoTime()` for measuring elapsed time. It's a timer, and `currentTimeMillis()` is rather a clock. Why this can matter: https://stackoverflow.com/a/351571 – Eel Lee Jul 24 '17 at 07:25
  • The coded up version was generally for testing so people could get the gist and also check the scaling and to see if it actually got to be faster, and it clearly does by a lot. Enough that one could certainly use this rather than naive loop. I could clearly have removed most of the code here and it's not really intended to be productionish (other than working more or less correctly). – Tatarize Jul 24 '17 at 08:24
6

In a previous attempt, I tried a simple dichotomic search but, as was pointed out, that doesn't actually lead anywhere.

Here's my best attempt. I doubt it's worth the hassle, and it might even be slower for real world cases, but here goes.

If you have an array A[0..n] of sorted, positive integers, and you want to check if there is a multiple of a positive integer X in A[i..j], where 0≤i

If i>j then A[i..j] ist empty and thus contains no multiple of X
Else if A[i] % X = 0 or A[j] % X = 0 then A[i..j] contains a multiple of X
Else if A[i] / X = A[j] / X then A[i..j] contains no multiple of X
Else A[i..j] contains a multiple of X iff A[i+1..(i+j)/2] or A[(i+j)/2+1..j-1] contains a multiple of X

My guess is that the complexity would be O(n/X) ish, so, not really an improvement in the grand scheme of things.

Edited to add: If your data is really peculiar, this might actually help. In many cases, it might actually hurt:

  • there’s the overhead of managing the return stack (the first recursive call is non-terminal)
  • because we skip back and forth through the data instead of traversing it, we wreck cache locality
  • we make branch prediction more difficult for the processor
Zen
  • 119
  • 7
  • This looks much better, i am really looking into it. Case 3 is the same as A[i] = A[j], right? – shole Jul 21 '17 at 09:43
  • @Shale: actually, it's not the same, because it's integer division. For instance, 11/5 = 14/5 = 2. This is the case that lets us skip chunks of values. – Zen Jul 21 '17 at 09:54
  • Let me try this method when I arrive home, at least currently it sounds brilliant to me – shole Jul 21 '17 at 10:00
  • Look's good, and can be much faster in some cases than simple iteration. But the complexity of this algorithm is still `O(n)`. – DAle Jul 21 '17 at 10:58
  • Agree , but the idea itself is quite smart – shole Jul 21 '17 at 11:32
  • Case 3, is wrong. If Ai/X = Aj/X then A[i...j] contains no multiple of X so long as Ai/X is not a multiple of X. I coded it up in my answer. Turns out from running it, that we get better results for X > 128. When X is much larger it basically takes tiny fractions of the time. – Tatarize Jul 21 '17 at 23:41
  • @Tatarize thanks for pointing out the mistake, actually I had a typo in the line above (should have tested A[i] and A[j] but accidentally tested A[j] twice) – Zen Jul 22 '17 at 12:04
1

I believe the answer to the general question is no. Suppose the following structure for A: the i'th element of A ai is given by i*x +1 so there are no elements in A, that are multiples of x. However you never save any time by using any of the "trickery" described above... you basically have to make a choice based on your knowledge of A and x. You basically can choose between O(N) and O(K) where K is the number of possible multiples of x in A, you can achieve O(K) by using hashing, such that the i*x in A becomes a constant time operation (on average...).

Chris
  • 710
  • 7
  • 15
0

Two things to try - first find the first element in the array greater than x. No need to search the bottom half. A binary search can do this. That will reduce the time in some cases, but if the very first element is greater than x, then no need obviously. Then, having identified the section with possible multiples of x, do a binary search if running a single thread, or if you can run multiple threads, split the upper half into segments, and have a separate thread search each. I think this may be the most efficient way, but subject to the following caveats.

  1. If the first element greater than x is fairly low in the array, you are going to spend more time setting up the binary search than in linear scanning.

  2. There is a cost to doing binary searches too. If you array is not very big, you are going to be less efficient with that than linear searches. I am not talking order of the algorithm. I am considering just the execution time.

  3. If you can do multiple threads, there is a fairly hefty cost involved in setting those up too. Unless your array is obscenely long, you may not gain any benefits by doing that too. However, if it is multiple millions of items long, you may benefit by splitting it up into smaller chunks. I'd say it'd be rare for this scenario to be of use.

0

Subtract x from A iteratively. Stop if any result is equal to 0.

Regarding the subtractions, you do it 1 + {max(A) DIV x} times at the worst scenario this way, in that you have to subtract x from the maximum element of A after all others have failed the check already, and once more (hence the 1) to the maximum element itself, which also fails the check, like 7 DIV 3 = 2, so in three iterations:

  1. 7 - 3 = 4
  2. 4 - 3 = 1
  3. 1 - 3 = -2 , != 0 therefore it's not divisible

This still qualifies as O(n), but ends being fast, doing integer subtraction on an array.

catastrophic-failure
  • 3,759
  • 1
  • 24
  • 43