determine if an array has the numbers a to b each once

Question

Given an array A of size n, and two numbers a and b with b-a+1=n, I need to determine whether or not A contains each of the numbers between a and b (exactly once).

For example, if n=4 and a=1,b=4, then I'm looking to see if A is a rearrangement of [1,2,3,4].

In practice, I need to do this with O(1) space (no hash table).

My first idea was to sort A, but I have to do this without rearranging A, so that's out.

My second idea is to run through A once, adding up the entries and checking that they are in the correct range. At the end, I have to get the right sum (for a=1,b=n, this is n(n+1)/2), but this doesn't always catch everything, e.g. [1,1,4,4,5] passes the test for n=5,a=1,b=5, but shouldn't.

The only idea of mine that works is to pass through the array n times making sure to see each number once and only once. Is there a faster solution?

Not an interview question or homework. For size, n can go up to 10^9 or more. My solution works, but there has to be something better. — Daniel, Jan 26 '14 at 04:43
Will an execution order of `O((b-a) * n)` (worst case) fit your expectations? — ichramm, Jan 26 '14 at 04:58
@ichramm The question says `b-a+1=n`, so this would be equal to `O(n^2)`, which is equal to the obvious solution to iterate through the array for every search value. — , Jan 26 '14 at 05:00
Have you measured the time for your current method for the largest size? It will slow down worse than O(n^2) as the array gets too large for the caches. It may be better to consider a one-bit-per-element array that is O(n) space but with a low constant multiplier. — Patricia Shanahan, Jan 26 '14 at 05:01
What are the implied restrictions, as in which numbers could A contain? It's kind of implied that A can't have negative values, but it would be so much easier if you just stated all your restrictions. — Gleno, Jan 26 '14 at 05:13
See also: [mathoverflow](http://mathoverflow.net/questions/25374/duplicate-detection-problem/25419) — PengOne, Jan 26 '14 at 05:17
How _likely_ is it that the data you give to your algorithm will be a permutation? If it is unlikely, then you can use your above methods to return false in O(n) time and O(1) memory, and only have to check the hard way if elements sum to n(n+1)/2. — Jems, Jan 26 '14 at 05:27
I've given it a bit of a think, and it seems like that checking prod(A) and sum(A) both mod some prime p give a good-enough probabilistic O(1) solution. Note that in general this is not true, say for A = {1..12}, {4, 2, 3, 2, 5, 3, 14, 8, 9, 5, 11, 12}. — Gleno, Jan 26 '14 at 05:35
@Gleno the product will be n! if it is a permutation (or close to it for a variety of inputs), which with stirlings approximation is around n^n. So it will take O(n) to store. Correct me if I'm wrong. edit - mod p makes it O(1) if you apply as you go along — Jems, Jan 26 '14 at 05:38
Create array B size b-a+1 and initialize to false. Go through array A and for each element, check off in B. If all ready checked off, criteria is false. `if (inrange(A[i], a,b) { if (B[A[i]-a]) return false, else (B[A[i]]) = true}` Else return true. — chux - Reinstate Monica, Jan 26 '14 at 05:56

score 3 · Accepted Answer · answered Jan 26 '14 at 06:58

You can do this with a single pass through the array, using only a minor modification of the n(n+1)/2 method you already mentioned.

To do so, walk through the array, ignoring elements outside the a..b range. For numbers that are in the correct range, you want to track three values: the sum of the numbers, the sum of the squares of the numbers, and the count of the numbers.

You can pre-figure the correct values for both the sum of numbers and the sum of the squares (and, trivially, the count).

Then compare your result to the expected results. Consider, for example, if you're searching for 1, 2, 3, 4. If you used only the sums of the numbers, then [1, 1, 4, 4] would produce the correct result (1+2+3+4 = 10, 1+1+4+4 = 10), but if you also add the sums of the squares, the problem is obvious: 1+4+9+16 = 30 but 1+1+16+16 = 34.

This is essentially applying (something at least very similar to) a Bloom filter to the problem. Given a sufficiently large group and a fixed pair of functions, there's going to be some set of incorrect inputs that will produce the correct output. You can reduce that possibility to an arbitrarily low value by increasing the number of filters you apply. Alternatively, you can probably design an adaptive algorithm that can't be fooled--offhand, it seems like if your range of inputs is N, then raising each number to the power N+1 will probably assure that you can only get the correct result with exactly the correct inputs (but I'll admit, I'm not absolutely certain that's correct).

This is a good idea for smaller N, but raising number to 10^9 isn't going to be easy. — Daniel, Jan 26 '14 at 17:40
@Daniel: Depends. There are methods (widely used in RSA cryptography, for one example) for raising numbers to large powers efficiently. — Jerry Coffin, Jan 26 '14 at 17:44
Simple example: both sets 6,5,1 and 7,3,2 have equal count, sum and sum of squares. — Egor Skriptunoff, Jan 28 '14 at 16:06

Vikram Bhat · Answer 2 · 2014-01-26T07:42:07.920

0

Here is a O(1) space and O(n) solution that might help :-

Find the mean and standard deviation in range (a,b)
Scan the array and find mean and standard deviation.
if any number is outside (a,b) return false
if(mean1!=mean2 || sd1!=sd2) return false else true.

Note : I might not be 100% accurate.

edited Jan 26 '14 at 07:42

answered Jan 26 '14 at 07:17

Vikram Bhat

6,106
3
20
19

This isn't very accurate at all for n big (even 100). – Daniel Jan 26 '14 at 17:39
@Daniel I dont think it depends on size of n but more so on distribution of numbers – Vikram Bhat Jan 26 '14 at 18:00

score 0 · Answer 3 · answered Jan 26 '14 at 10:42

Here's a solution that fails with the probability of a hash collision.

Take an excellent (for example cryptographic) hash function H.

Compute: xor(H(x) for x in a...b)
Compute: xor(H(A[i]) for i in 1...n)

If the two are the different, then for sure you don't have a permutation. If the two are the same, then you've almost certainly got a permutation. You can make this immune to input that's been picked to produce a hash collision by including a random seed into the hash.

This is obviously O(b-a) in running time, needs O(1) external storage, and trivial to implement.

determine if an array has the numbers a to b each once

3 Answers3