Check if array B is a permutation of A

Question

I tried to find a solution to this but couldn't get much out of my head.

We are given two unsorted integer arrays A and B. We have to check whether array B is a permutation of A. How can this be done.? Even XORing the numbers wont work as there can be several counterexamples which have same XOR value bt are not permutation of each other.

A solution needs to be O(n) time and with space O(1)

Any help is welcome!! Thanks.

I was asked to solve this in O(n) time and with space O(1). I dont think if these constraints can be satisfied! — akaHuman, May 17 '12 at 16:32
Is there an upper bound on the maximum allowed integer? Or can they be arbitrary-precision integers? — templatetypedef, May 17 '12 at 16:43
no upper bound as such, can be any arbitrary precision integer. — akaHuman, May 17 '12 at 17:16
If the integers can be arbitrary precision then 'n' in 'O(n)' needs to include the encoding size of all the integers and cannot represent the number of elements in the array per se. — Antti Huima, May 17 '12 at 17:27
Note that if the integers are arbitrary precision, things like multiplication suddenly don't work as O(1) operations anymore, as they tend to work for practical algorithm analysis purposes on register-size integers. The multiplication of two O(n) bigints, say, is not O(n) operation. I bet the question wasn't actually about "arbitrary precision integers", but about unspecified (but limited) precision. Anyway, maybe this part of the question should be re-checked. — Antti Huima, May 17 '12 at 17:36
Are you actually looking to check for cyclic permutations (or can they really be any kind of permutation?) — Kaganar, May 17 '12 at 18:32

Antti Huima · Answer 1 · 2012-05-23T08:13:36.823

11

The question is theoretical but you can do it in O(n) time and o(1) space. Allocate an array of 2³² counters and set them all to zero. This is O(1) step because the array has constant size. Then iterate through the two arrays. For array A, increment the counters corresponding to the integers read. For array B, decrement them. If you run into a negative counter value during iteration of array B, stop --- the arrays are not permutations of each others. Otherwise at the end (assuming A and B have the same size, a prerequisite) the counter array is all zero and the two arrays are permutations of each other.

This is O(1) space and O(n) time solution. However it is not practical, but would easily pass as a solution to the interview question. At least it should.

More obscure solutions

Using a nondeterministic model of computation, checking that the two arrays are not permutations of each others can be done in O(1) space, O(n) time by guessing an element that has differing count on the two arrays, and then counting the instances of that element on both of the arrays.
In randomized model of computation, construct a random commutative hash function and calculate the hash values for the two arrays. If the hash values differ, the arrays are not permutations of each others. Otherwise they might be. Repeat many times to bring the probability of error below desired threshold. Also on O(1) space O(n) time approach, but randomized.
In parallel computation model, let 'n' be the size of the input array. Allocate 'n' threads. Every thread i = 1 .. n reads the ith number from the first array; let that be x. Then the same thread counts the number of occurrences of x in the first array, and then check for the same count on the second array. Every single thread uses O(1) space and O(n) time.
Interpret an integer array [ a1, ..., an ] as polynomial x^a1 + x^a2 + ... + x^an where x is a free variable and the check numerically for the equivalence of the two polynomials obtained. Use floating point arithmetics for O(1) space and O(n) time operation. Not an exact method because of rounding errors and because numerical checking for equivalence is probabilistic. Alternatively, interpret the polynomial over integers modulo a prime number, and perform the same probabilistic check.

edited May 23 '12 at 08:13

answered May 17 '12 at 16:52

Antti Huima

25,136
3
52
71

I don't see how you can claim O(1) space when you assume a maximum and just pre-allocate it. If I used that criteria, I could claim that any algorithm is O(1) space so long as you allocate enough before you start. – David Buck May 17 '12 at 17:01
? The size of the array depends on the size of unsigned int. – Antti Huima May 17 '12 at 17:04
2

Ok, so if you used arbitrary precision integers? Smalltalk for example has unlimited Integer precision. The question never said "unsigned int". This is a question about algorithmic complexity, not implementation in a language that only supports 32 bit unsigned ints. – David Buck May 17 '12 at 17:09
This is the solution i used for character strings except that i had 2 use an array of size 26 for a-z letters. This is definitely an O(1) space and can be considered a theoretical solution. The size of the array though depends on the precision in this case but not on the input. – akaHuman May 17 '12 at 17:20
Heh ok but then what is O(n)? It cannot be the size of the array (anymore) because the size of the integers needs to be taken into account. n is then the total size of integers (presumably in binary encoding, say) stored in the array. I doubt this was the original intent, but maybe then. – Antti Huima May 17 '12 at 17:22
Pardon me here again. My last comment was a little messy one. The first sentence of it talks about my string solution and the remaining part is about the solution you suggested. and yes n is the number of integers in the array. and the precision is unspecified. – akaHuman May 17 '12 at 18:01
Every bounded collection can be bucket sorted. No one regularly considers integer questions to correspond with bounded integers. Note that by trying to be "realistic" about the size of the integer, the proposed algorithm is unrealistic about memory. It becomes even more obvious when one realises that most of the time systems that have enough memory for the above algorithm are actually 64-bit, and your natural word size is different. – ex0du5 May 17 '12 at 18:20
Creating the array is O(1), but initializing every element of the array to zero is O(n), whether you do it explicitly in your program or it is done implicitly by the run-time system of whatever language you are using. – user448810 May 18 '12 at 12:45
@user448810 no it's not O(n); n is the size the *input* array. The counter array has fixed size that is not dependent of n. Assuming a fixed precision for integers, regardless of what that precision is. – Antti Huima May 18 '12 at 14:37
Your `O(1)` memory claim for your hash is invalid. I could make a similar claim by saying "my algorithm will use no more data than this Google data center, which is constant size", which would be a silly claim. Hashes with `n` keys use at least `O(n)` space; it's as simple as that. – cheeken May 22 '12 at 21:45
@cheeken There is no hash. And 'n' in this problem is not the size of the key space, but the size of the array. The array can contain the same element many times, so the array can be larger than the key space. Anyway, of course the answer has lots of holes in it, and e.g. the counters themselves could overflow. However, if I would be interviewing and had posed this problem, I would have accepted this answer also. It shows that you understand what worst-case asymptotic complexity means in practice and how you can work around it. – Antti Huima May 23 '12 at 07:26
in the parallel model it's O(n) time but W(n^2) and the the total space is also n – Zoltán Haindrich Jul 15 '12 at 12:53
Sorry if I missed something, but if polynomials are using different order they will differ, and building interpolating polynomial takes $O(n^2)$ - the one that works distegarding the order. Checking for identity using floats is shortcut for "asking for trouble", numerical round-offs will make solution not working while checking with some $\epsilon$ will give false positives. Checking histogram takes $2^{32}$ which makes it impractical for *common* inputs and restrict numbers to 4 bytes only. – Evil Jul 16 '16 at 17:51

score 8 · Answer 2 · answered May 20 '12 at 04:08

If we are allowed to freely access a large list of primes, you can solve this problem by leveraging properties of prime factorization.

For both arrays, calculate the product of Prime[i] for each integer i, where Prime[i] is the ith prime number. The value of the products of the arrays are equal iff they are permutations of one another.

Prime factorization helps here for two reasons.

Multiplication is transitive, and so the ordering of the operands to calculate the product is irrelevant. (Some alluded to the fact that if the arrays were sorted, this problem would be trivial. By multiplying, we are implicitly sorting.)
Prime numbers multiply losslessly. If we are given a number and told it is the product of only prime numbers, we can calculate exactly which prime numbers were fed into it and exactly how many.

Example:

a = 1,1,3,4
b = 4,1,3,1
Product of ith primes in a = 2 * 2 * 5 * 7 = 140
Product of ith primes in b = 7 * 2 * 5 * 2 = 140

That said, we probably aren't allowed access to a list of primes, but this seems a good solution otherwise, so I thought I'd post it.

This is not worse than hashing. Instead of accessing prime nos., we can have hash table of the range of numbers that will be gotten as input in a & b and do it. In turn hash table can be reduced in size by using bits. — viji, Jul 27 '12 at 14:05

score 5 · Answer 3 · answered May 18 '12 at 15:50

I apologize for posting this as an answer as it should really be a comment on antti.huima's answer, but I don't have the reputation yet to comment.

The size of the counter array seems to be O(log(n)) as it is dependent on the number of instances of a given value in the input array.

For example, let the input array A be all 1's with a length of (2^32) + 1. This will require a counter of size 33 bits to encode (which, in practice, would double the size of the array, but let's stay with theory). Double the size of A (still all 1 values) and you need 65 bits for each counter, and so on.

This is a very nit-picky argument, but these interview questions tend to be very nit-picky.

score 4 · Answer 4 · answered May 20 '12 at 22:39

If we need not sort this in-place, then the following approach might work:

Create a HashMap, Key as array element, Value as number of occurances. (To handle multiple occurrences of the same number)
Traverse array A.
Insert the array elements in the HashMap.
Next, traverse array B.
Search every element of B in the HashMap. If the corresponding value is 1, delete the entry. Else, decrement the value by 1.
If we are able to process entire array B and the HashMap is empty at that time, Success. else Failure.

HashMap will use constant space and you will traverse each array only once.

Not sure if this is what you are looking for. Let me know if I have missed any constraint about space/time.

score 1 · Answer 5 · answered May 20 '12 at 22:53

You're given two constraints: Computational O(n), where n means the total length of both A and B and memory O(1).

If two series A, B are permutations of each other, then theres also a series C resulting from permutation of either A or B. So the problem is permuting both A and B into series C_A and C_B and compare them.

One such permutation would be sorting. There are several sorting algorithms which work in place, so you can sort A and B in place. Now in a best case scenario Smooth Sort sorts with O(n) computational and O(1) memory complexity, in the worst case with O(n log n) / O(1).

The per element comparision then happens at O(n), but since in O notation O(2*n) = O(n), using a Smooth Sort and comparison will give you a O(n) / O(1) check if two series are permutations of each other. However in the worst case it will be O(n log n)/O(1)

Kunukn · Answer 6 · 2014-03-18T10:49:06.997

The solution needs to be O(n) time and with space O(1). This leaves out sorting and the space O(1) requirement is a hint that you probably should make a hash of the strings and compare them.

If you have access to a prime number list do as cheeken's solution.

Note: If the interviewer says you don't have access to a prime number list. Then generate the prime numbers and store them. This is O(1) because the Alphabet length is a constant.

Else here's my alternative idea. I will define the Alphabet as = {a,b,c,d,e} for simplicity. The values for the letters are defined as:

a, b, c, d, e
1, 2, 4, 8, 16

note: if the interviewer says this is not allowed, then make a lookup table for the Alphabet, this takes O(1) space because the size of the Alphabet is a constant

Define a function which can find the distinct letters in a string.

// set bit value of char c in variable i and return result
distinct(char c, int i) : int

E.g. distinct('a', 0) returns 1
E.g. distinct('a', 1) returns 1
E.g. distinct('b', 1) returns 3

Thus if you iterate the string "aab" the distinct function should give 3 as the result

Define a function which can calculate the sum of the letters in a string.

// return sum of c and i
sum(char c, int i) : int

E.g. sum('a', 0) returns 1
E.g. sum('a', 1) returns 2
E.g. sum('b', 2) returns 4

Thus if you iterate the string "aab" the sum function should give 4 as the result

Define a function which can calculate the length of the letters in a string.

// return length of string s
length(string s) : int

E.g. length("aab") returns 3

Running the methods on two strings and comparing the results takes O(n) running time. Storing the hash values takes O(1) in space.

 e.g. 
 distinct of "aab" => 3
 distinct of "aba" => 3
 sum of "aab => 4
 sum of "aba => 4
 length of "aab => 3
 length of "aba => 3

Since all the values are equal for both strings, they must be a permutation of each other.

EDIT: The solutions is not correct with the given alphabet values as pointed out in the comments.

This does not work, because "bccabcde" and "ddaabcde" have the same sum (10 + 31), length (8) and "distinct" (all), but the strings are not permutations of each other. — Antti Huima, Mar 17 '14 at 10:56
@AnttiHuima I calculate "bccabcde" as 41 and "ddaabcde" as 48 `function hash(str){var arr = {'a':1, 'b':2, 'c':4, 'd':8, 'e':16}; return str.split('').reduce(function(a,b){ return a + arr[b]; }, 0); }` — Kunukn, Mar 17 '14 at 21:05
Yes, sorry, it was a typo. It should be "bccabcde" and "daaabcde", i.e. the second string starts "daa... " instead of "dda". The point is that b+c+c = 4+4+2 = 10 and d+a+a = 8+1+1 = 10, so the sums become equal. Sorry about the typo. — Antti Huima, Mar 18 '14 at 09:04

score 0 · Answer 7 · edited May 23 '17 at 12:33

0

You can convert one of the two arrays into an in-place hashtable. This will not be exactly O(N), but it will come close, in non-pathological cases.

Just use [number % N] as it's desired index or in the chain that starts there. If any element has to be replaced, it can be placed at the index where the offending element started. Rinse , wash, repeat.

UPDATE: This is a similar (N=M) hash table It did use chaining, but it could be downgraded to open addressing.

edited May 23 '17 at 12:33

Community

1
1

answered May 17 '12 at 16:53

wildplasser

43,142
8
66
109

Could you elaborate on in-place hashtable? I don't see how you can store the hashtable in exactly the same space as an array, so you'd almost certainly be violating the O(1) space requirement. – Jimmy May 17 '12 at 17:17
Yes, you can permute the array in such a way that it forms a hash table with open addressing, say linear probing. That is typically an O(N) process. Open addressing does not need additional space. I am working on it... See for instance my tinyhash contribution here, which used N slots for N entries. (which did do sorting, but that is not strictly necessary) – wildplasser May 17 '12 at 17:29
1

After you use linear probing and have 100% load factor, iterating through the array is no longer O(n) process but O(n^2) or something like that. – Antti Huima May 17 '12 at 17:43
No. It is still O(N). Average "(mean square) chain length" will be about 1.5. See the histogram in my link. Constructing it will be of similar complexity (but is harder, caused by the space constraint) – wildplasser May 17 '12 at 18:20
But the input could be pathological that maps every input to the same bucket / chain. It does not matter if for some random number distributions the chain length is linear. – Antti Huima May 18 '12 at 05:56
@antti.huima It all depends on whether we are talking expected behavior or worst case behavior. This looks like it should/may be O(N) expected, but much worse on worst case (Taking wildplassers claims at face value). – Michael Anderson May 19 '12 at 06:18
@antti.huima: with a 100% load factor, about 30% of the entries will be placed at the right slot. Another 30% will be off-by one; and the rest will land at larger displacements. IIRC this is known as the "parking problem". The average number of probes will probably be about 3 for a hit, and N for a miss. In this particular problem there is an extra advantage: the algorithm can terminate once the first non-existant value is encountered. – wildplasser May 19 '12 at 11:45

score 0 · Answer 8 · answered May 17 '12 at 18:20

I'd use a randomized algorithm that has a low chance of error.

The key is to use a universal hash function.

def hash(array, hash_fn):
  cur = 0
  for item in array:
    cur ^= hash_item(item)
  return cur

def are_perm(a1, a2):
  hash_fn = pick_random_universal_hash_func()
  return hash_fn(a1, hash_fn) == hash_fn(a2, hash_fn)

If the arrays are permutations, it will always be right. If they are different, the algorithm might incorrectly say that they are the same, but it will do so with very low probability. Further, you can get an exponential decrease in chance for error with a linear amount of work by asking many are_perm() questions on the same input, if it ever says no, then they are definitely not permutations of each other.

uncle_xia · Answer 9 · 2012-05-19T05:42:12.720

0

I just find a counterexample. So, the assumption below is incorrect.

I can not prove it, but I think this may be possible true.

Since all elements of the arrays are integers, suppose each array has 2 elements, and we have

a1 + a2 = s
a1 * a2 = m

b1 + b2 = s
b1 * b2 = m

then {a1, a2} == {b1, b2}

if this is true, it's true for arrays have n-elements.

So we compare the sum and product of each array, if they equal, one is the permutation of the other.

edited May 19 '12 at 05:42

answered May 18 '12 at 16:01

uncle_xia

1
1

Not true: [0,0,2] and [0,1,1] – Rafał Dowgird May 18 '12 at 21:58
since there is a multiplication, we should take care of zero in the arrays. we can just count the number of zeros in each array and exclude that in the sums and the products. – uncle_xia May 19 '12 at 04:52

Check if array B is a permutation of A

9 Answers9

More obscure solutions

Linked

Related