Compare rotated lists, containing duplicates

Question

I'm looking for an efficient way to compare lists of numbers to see if they match at any rotation (comparing 2 circular lists).

When the lists don't have duplicates, picking smallest/largest value and rotating both lists before comparisons works. But when there may be many duplicate large values, this isn't so simple.

For example, lists [9, 2, 0, 0, 9] and [0, 0, 9, 9, 2] are matches,
where [9, 0, 2, 0, 9] won't (since the order is different).

Heres an example of an in-efficient function which works.

def min_list_rotation(ls):
    return min((ls[i:] + ls[:i] for i in range(len(ls))))

# example use
ls_a = [9, 2, 0, 0, 9]
ls_b = [0, 0, 9, 9, 2]

print(min_list_rotation(ls_a) == min_list_rotation(ls_b))

This can be improved on for efficiency...

check sorted lists match before running exhaustive tests.
only test rotations that start with the minimum value
(skipping matching values after that)
effectively finding the minimum value with the furthest & smallest number after it (continually - in the case there are multiple matching next-biggest values).
compare rotations without creating the new lists each time..

However its still not a very efficient method since it relies on checking many possibilities.

Is there a more efficient way to perform this comparison?

Related question: Compare rotated lists in python

take a look at my answer: http://stackoverflow.com/a/26924896/1090562. I believe this is what you are looking for. — Salvador Dali, Jan 02 '16 at 07:14
Apologies for asking a duplicate question (though I did search on this topic, just missed using the keyword **circularly**). — ideasman42, Jan 02 '16 at 07:15

score 0 · Answer 1 · answered Jan 02 '16 at 08:28

If you are looking for duplicates in a large number of lists, you could rotate each list to its lexicographically minimal string representation, then sort the list of lists or use a hash table to find duplicates. This canonicalisation step means that you don't need to compare every list with every other list. There are clever O(n) algorithms for finding the minimal rotation described at https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation.

Right, I've got this working in this answer - http://stackoverflow.com/a/34564464/432509 — ideasman42, Jan 02 '16 at 10:21

score 0 · Answer 2 · answered Jan 02 '16 at 10:53

You almost have it.

You can do some kind of "normalization" or "canonicalisation" of a list independently of the others, then you only need to compare item by item (or if you want, put them in a map, in a set to eliminate duplicates, ..."

1 take the minimum item, which is not preceded by itself (in a circular way)

In you example 92009, you should take the first 0 (not the second one)

2 If you have always the same item (say 00000), you just keep that: 00000

3 If you have the same item several times, take the next item, which is minimal, and keep going until you find one unique path with minimums.

Example: 90148301562 => you have 0148.. and 0156.. => you take 0148

4 If you can not separate the different paths (= if you have equality at infinite), you have a repeating pattern: then, no matters: you take any of them.

Example: 014376501437650143765 : you have the same pattern 0143765...

It is like AAA, where A = 0143765

5 When you have your list in this form, it is easy to compare two of them.

How to do that efficiently:

Iterate on your list to get the minimums Mx (not preceded by itself). If you find several, keep all of them.

Then, iterate from each minimum Mx, take the next item, and keep the minimums. If you do an entire cycle, you have a repeating pattern.

Except the case of repeating pattern, this must be the minimal way.

Hope it helps.

score 0 · Answer 3 · answered Jan 02 '16 at 22:40

I would do this in expected O(N) time using a polynomial hash function to compute the hash of list A, and every cyclic shift of list B. Where a shift of list B has the same hash as list A, I'd compare the actual elements to see if they are equal.

The reason this is fast is that with polynomial hash functions (which are extremely common!), you can calculate the hash of each cyclic shift from the previous one in constant time, so you can calculate hashes for all of the cyclic shifts in O(N) time.

It works like this:

Let's say B has N elements, then the the hash of B using prime P is:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb = Hb*P + B[i];
}

This is an optimized way to evaluate a polynomial in P, and is equivalent to:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb += B[i] * P^(N-1-i);  //^ is exponentiation, not XOR
}

Notice how every B[i] is multiplied by P^(N-1-i). If we shift B to the left by 1, then every every B[i] will be multiplied by an extra P, except the first one. Since multiplication distributes over addition, we can multiply all the components at once just by multiplying the whole hash, and then fix up the factor for the first element.

The hash of the left shift of B is just

Hb1 = Hb*P + B[0]*(1-(P^N))

The second left shift:

Hb2 = Hb1*P + B[1]*(1-(P^N))

and so on...

Compare rotated lists, containing duplicates

3 Answers3