151

For instance, I have lists:

a[0] = [1, 1, 1, 0, 0]
a[1] = [1, 1, 0, 0, 1]
a[2] = [0, 1, 1, 1, 0]
# and so on

They seem to be different, but if it is supposed that the start and the end are connected, then they are circularly identical.

The problem is, each list which I have has a length of 55 and contains only three ones and 52 zeros in it. Without circular condition, there are 26,235 (55 choose 3) lists. However, if the condition 'circular' exists, there are a huge number of circularly identical lists

Currently I check circularly identity by following:

def is_dup(a, b):
    for i in range(len(a)):
        if a == list(numpy.roll(b, i)): # shift b circularly by i
            return True
    return False

This function requires 55 cyclic shift operations at the worst case. And there are 26,235 lists to be compared with each other. In short, I need 55 * 26,235 * (26,235 - 1) / 2 = 18,926,847,225 computations. It's about nearly 20 Giga!

Is there any good way to do it with less computations? Or any data types that supports circular?

Jeon
  • 4,000
  • 4
  • 28
  • 73
  • Just a hunch : I feel that suffix trees might help here. http://en.wikipedia.org/wiki/Suffix_tree . To build one, see http://en.wikipedia.org/wiki/Ukkonen%27s_algorithm – Rerito Nov 14 '14 at 07:21
  • The ones are in a row? Adjacent indices? – Alberto Coletta Nov 14 '14 at 11:38
  • If [David Eisenstat's reading](http://stackoverflow.com/a/26933996/1763356) is correct, please include in the question more about what you're actually trying to do and less about comparing lists for cyclical equality. (If it's not, please tell me (and him) so I can undelete my answer.) – Veedrac Nov 14 '14 at 18:47
  • If you need a circular datatype you can use https://docs.python.org/2.7/library/itertools.html#itertools.cycle – tom Nov 14 '14 at 19:37
  • *circular identical* is this a python term? – Xline Nov 14 '14 at 20:32
  • 87 clocks per list search, or 22 nanoseconds on my machine using the bits in a unint64_t to hold the 55 members of the lists. Total time for NxN, where N=26235, is less than 16 seconds. Code below is in C/C++, but I refrained from using bit operators, so you can probably code it up in Python pretty easily. –  Nov 15 '14 at 03:19
  • I feel it's important for me to mention here that that the top-voted answer has a worse running time than mine. – user541686 Nov 15 '14 at 10:35
  • 1
    @Mehrdad But far worse running time than any answer that converts to a canonical form, far worse running time than converting to an integer and far, far worse running time than David Eisenstat's. – Veedrac Nov 15 '14 at 14:56
  • 2
    All of answers trying to solve general problem, but in this particular case with only 3 ones you can represent every list with 3 numbers being a number of zeros between ones. List from question can be represented as [0,0,2], [0,2,0], [2,0,0]. You can simply reduce list in one run and then check reduced list. If they are "circularly identical" then originals are too. – abc667 Nov 15 '14 at 19:09
  • 2
    I guess Stack Overflow doesn't need voting then. All we need is to run the code in all the solutions, and present them in the order in which they finish. – Dawood ibn Kareem Nov 17 '14 at 02:19
  • For me, it's very hard to pick only one answer as the best one. So I postpone choosing it. I aplogize for this. – Jeon Nov 17 '14 at 02:20
  • Dali's clever insight to concatenate the list to itself as a work-around for an inability to rotate the list in memory not withstanding, AFAICT, his code takes ~ 1 million times as long to run & 10x the memory. Reducing "THE" list to conical form up front as abc667, Veedrac, and Eisenstat suggest is sensible, but my approach will enjoy a similar speed advantage there too. If you have "THE" list this reduction works, but if you are a service with users dropping lists on you all day long, or are working from a flow of telemetry, it's impossible. I solved the question asked, not define it away. –  Nov 17 '14 at 21:27
  • 2
    Since it hasn't been mentioned so far, the "canonical form" referred to by @abc667, Veedrac, and Eisenstat is called Run Length Encoding http://en.wikipedia.org/wiki/Run-length_encoding – David Lovell Nov 18 '14 at 22:37
  • If there will always be only three ones in the list, an easy way to reduce the number of shifts is to replace `for i in range(len(a)):` with `for i in numpy.nonzero(a)[0][0] - numpy.nonzero(b)[0]:`. That will limit the number of rolls to the number of ones in `b`, allowing you to go from a maximum of 52 to a maximum of 3. To speed things up even further, you could convert `a` and `b` into numpy arrays at the beginning of the function and do the comparison using them instead of the time-consuming conversion back to a Python list. – D Krueger Nov 19 '14 at 23:24

18 Answers18

138

First off, this can be done in O(n) in terms of the length of the list You can notice that if you will duplicate your list 2 times ([1, 2, 3]) will be [1, 2, 3, 1, 2, 3] then your new list will definitely hold all possible cyclic lists.

So all you need is to check whether the list you are searching is inside a 2 times of your starting list. In python you can achieve this in the following way (assuming that the lengths are the same).

list1 = [1, 1, 1, 0, 0]
list2 = [1, 1, 0, 0, 1]
print ' '.join(map(str, list2)) in ' '.join(map(str, list1 * 2))

Some explanation about my oneliner: list * 2 will combine a list with itself, map(str, [1, 2]) convert all numbers to string and ' '.join() will convert array ['1', '2', '111'] into a string '1 2 111'.

As pointed by some people in the comments, oneliner can potentially give some false positives, so to cover all the possible edge cases:

def isCircular(arr1, arr2):
    if len(arr1) != len(arr2):
        return False

    str1 = ' '.join(map(str, arr1))
    str2 = ' '.join(map(str, arr2))
    if len(str1) != len(str2):
        return False

    return str1 in str2 + ' ' + str2

P.S.1 when speaking about time complexity, it is worth noticing that O(n) will be achieved if substring can be found in O(n) time. It is not always so and depends on the implementation in your language (although potentially it can be done in linear time KMP for example).

P.S.2 for people who are afraid strings operation and due to this fact think that the answer is not good. What important is complexity and speed. This algorithm potentially runs in O(n) time and O(n) space which makes it much better than anything in O(n^2) domain. To see this by yourself, you can run a small benchmark (creates a random list pops the first element and appends it to the end thus creating a cyclic list. You are free to do your own manipulations)

from random import random
bigList = [int(1000 * random()) for i in xrange(10**6)]
bigList2 = bigList[:]
bigList2.append(bigList2.pop(0))

# then test how much time will it take to come up with an answer
from datetime import datetime
startTime = datetime.now()
print isCircular(bigList, bigList2)
print datetime.now() - startTime    # please fill free to use timeit, but it will give similar results

0.3 seconds on my machine. Not really long. Now try to compare this with O(n^2) solutions. While it is comparing it, you can travel from US to Australia (most probably by a cruise ship)

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • Replacing `in` with regex should fix the issue Adam Smith pointed out: `re.search(r'\b{}\b'.format(' '.join(map(str, list2))), ' '.join(map(str, list * 2)))` – Ashwini Chaudhary Nov 14 '14 at 07:31
  • 3
    Just adding padding spaces (1 before and 1 after each string) will do the trick. No need to overcomplicate things with regexes. (Of course I'm assuming we compare lists of the same length) – Rerito Nov 14 '14 at 07:32
  • 2
    @Rerito unless either list includes strings, which may themselves have leading or trailing spaces. Still can cause collisions. – Adam Smith Nov 14 '14 at 07:39
  • @SalvadorDali I still think this has some hidden "gotcha" but I can't think of what it would be right now. Regardless a very smart answer! – Adam Smith Nov 14 '14 at 07:42
  • This is a convenient approach, but it's only O(n) if the underlying substring-finding operation `in` is O(n) (e.g. if it uses the KMP algorithm). Most languages use a naive O(n^2) approach for this. – j_random_hacker Nov 14 '14 at 12:34
  • @j_random_hacker I am aware of this. But if you will take a look at my question [here](http://stackoverflow.com/q/26561845/1090562), you can see that it is O(n) – Salvador Dali Nov 14 '14 at 12:54
  • @SalvadorDali: I checked the http://effbot.org/zone/stringlib.htm link there, and in fact it's still O(nm) in the worst case, despite being sublinear in the best cases (due to the BM aspects). Regardless, if you claim a time complexity that hinges on the (unspecified) time complexity of a library function, you should be explicit about the assumptions you're making. – j_random_hacker Nov 14 '14 at 13:24
  • @j_random_hacker thank you. This definitely makes sense. I added it to the answer. – Salvador Dali Nov 14 '14 at 13:31
  • 12
    I don't like this answer. The string operation nonsense made me dislike it and [David Eisenstat's answer](http://stackoverflow.com/a/26933996/1763356) made me willing to downvote it. This comparison *can* be done in O(n) time with a string but it also [can be done in O(n) time with an integer](http://stackoverflow.com/revisions/26936218/2) [need 10k as self-deleted], which is faster. Nonetheless, David Eisenstat's answer shows that doing any comparisons at all is pointless since the answer doesn't need it. – Veedrac Nov 14 '14 at 18:42
  • 7
    @Veedrac are you kidding me? Have you heard about computational complexity? Davids answer takes O(n^2) time and O(n^2) space just to generate all his repetitions which even for small inputs 10^4 length takes like 22 seconds and who knows how much ram. Not to mention that we have not started to search for anything right now (we just generated all cyclic rotations). And my string nonsense gives you a complete result for inputs like 10^6 in less than 0.5 seconds. It also needs O(n) space to store it. So please take some time understanding the answer before jumping into conclusion. – Salvador Dali Nov 14 '14 at 23:10
  • [Response to @SalvadorDali here for any other readers.](http://stackoverflow.com/questions/26924836/how-to-check-whether-two-lists-are-circularly-identical/26933996#comment42425548_26933996) – Veedrac Nov 15 '14 at 00:00
  • 1
    @SalvadorDali You seem very (soft) time focused ;-) – Déjà vu Nov 17 '14 at 00:31
  • @ring0 not really sure what this mean. I am solving various programming competitions, where you have to come up with the most efficient algorithm. – Salvador Dali Nov 17 '14 at 00:35
  • 1
    @SalvadorDali oh ok, so Salvador Dali is your real name!! – Déjà vu Nov 17 '14 at 00:36
  • @ring0 :-) you would be surprised, but no. Also I am not a girl, also I have no connection to Salvador Dali, except I like his personality and some of his works. – Salvador Dali Nov 17 '14 at 00:38
  • 1
    @SalvadorDali [hoping my comment makes sense now](http://www.wildculture.com/article/salvador-dali%E2%80%99s-soft-time-cameos-passing-luminati/1361) ;-) – Déjà vu Nov 17 '14 at 00:39
  • @SalvadorDali. trying to understand your benchmark that consumed .3 seconds. You would have to do 26,235 of those to meet the OP's requirement, or 26,235 x 26,235? AFAICT, it looks like you are searching a single list of 1,000 members for a match with another 1,000 member list? Sorry, not very fluent in Python. –  Nov 17 '14 at 01:46
  • This answer is interesting but not general enough in my opinion, it relies on numbers mapping to strings 'nicely'. But what if the input list was not homogeneous - it could be a mixture of strings, integers, or just any old objects without an obvious string serialisation? – wim Nov 17 '14 at 09:51
  • although for the OP's case, which was simple and constrained with just ones and zeros, it's a nice trick :) – wim Nov 17 '14 at 09:59
  • Using `datetime` to try and benchmark is a really bad idea. Python has the `timeit` module for that purpose. – Gareth Latty Nov 18 '14 at 23:27
  • @Lattyware thank you, but can you please expand why is this a `really bad idea`? Will it not be able to spot the difference between 1 seconds and 1 minute? I understand that if I will need to get more precision, I would rather use timeit, but may be I am missing something. – Salvador Dali Nov 18 '14 at 23:43
  • @SalvadorDali For one, it's wall-clock time, so if a leap second (or more, by DST or the user/system just changing the clock) goes by your results will be off. Precision is also a concern, not just in terms of fractions of a second, but that it runs the code many times to try and stop other factors afffecting the reported time. It's one of those situations where it's not hard to reinvent the wheel, until you remember the one hundred edge cases you didn't handle. Timing is like security, unless you are an expert, use the one that's had a lot of testing. – Gareth Latty Nov 18 '14 at 23:47
  • I don't see how the string operation can be anything other than O(n^2), ie what algo is used in the substring operation.... isn't it just faster cos it's written in C? – Andy Hayden Nov 18 '14 at 23:58
  • @AndyHayden please look at this link to read about O(n) algorithms to find substring http://en.wikipedia.org/wiki/String_searching_algorithm. C has nothing to do with it, you can write O(n) algorithm in any language. – Salvador Dali Nov 19 '14 at 00:04
  • @SalvadorDali Does this mean you can use same ideas for a python *list* searching algo in O(n) ? – Andy Hayden Nov 19 '14 at 00:30
  • @AndyHayden I think that yes, but the algorithm will be harder, then the one I listed here. – Salvador Dali Nov 19 '14 at 00:50
  • you can Apply the same KMP recover idea using number matches. This will be optimal in this case. Then find matches of list1 in (list1 + list2). – Miguel Nov 19 '14 at 03:53
  • @SalvadorDali. Hillarious that you think your solution is O(n) instead of O(n^n). Starting with a list size of 10, growing my X10, graph the run-time and you'll see I'm right. You seem to think because you found a string function that finds a sub-string in a string, and you only have to call that 'n' times your solution is O(n). By that logic I can solve this for ANY size list in O(1). I'll just call main() ... rolleyes. –  Nov 25 '14 at 00:48
  • @RocketRoy I think you misread the code. isCircular calls `in` exactly once. That then calls `str.__contains__` which has time complexity is O(n) average and O(n^2) worst case https://stackoverflow.com/questions/35220418/runtime-of-pythons-if-substring-in-string . – Trevor Merrifield Jan 29 '18 at 20:24
  • For anyone like me wondering why the list needs to be converted to a string, it is because `in` will return False when performing it on the lists. This is because `in` only returns True if the list contains the other list. On strings, it returns True if a substring exists – JolonB Nov 08 '20 at 22:23
39

Not knowledgeable enough in Python to answer this in your requested language, but in C/C++, given the parameters of your question, I'd convert the zeros and ones to bits and push them onto the least significant bits of an uint64_t. This will allow you to compare all 55 bits in one fell swoop - 1 clock.

Wickedly fast, and the whole thing will fit in on-chip caches (209,880 bytes). Hardware support for shifting all 55 list members right simultaneously is available only in a CPU's registers. The same goes for comparing all 55 members simultaneously. This allows for a 1-for-1 mapping of the problem to a software solution. (and using the SIMD/SSE 256 bit registers, up to 256 members if needed) As a result the code is immediately obvious to the reader.

You might be able to implement this in Python, I just don't know it well enough to know if that's possible or what the performance might be.

After sleeping on it a few things became obvious, and all for the better.

1.) It's so easy to spin the circularly linked list using bits that Dali's very clever trick isn't necessary. Inside a 64-bit register standard bit shifting will accomplish the rotation very simply, and in an attempt to make this all more Python friendly, by using arithmetic instead of bit ops.

2.) Bit shifting can be accomplished easily using divide by 2.

3.) Checking the end of the list for 0 or 1 can be easily done by modulo 2.

4.) "Moving" a 0 to the head of the list from the tail can be done by dividing by 2. This because if the zero were actually moved it would make the 55th bit false, which it already is by doing absolutely nothing.

5.) "Moving" a 1 to the head of the list from the tail can be done by dividing by 2 and adding 18,014,398,509,481,984 - which is the value created by marking the 55th bit true and all the rest false.

6.) If a comparison of the anchor and composed uint64_t is TRUE after any given rotation, break and return TRUE.

I would convert the entire array of lists into an array of uint64_ts right up front to avoid having to do the conversion repeatedly.

After spending a few hours trying to optimize the code, studying the assembly language I was able to shave 20% off the runtime. I should add that the O/S and MSVC compiler got updated mid-day yesterday as well. For whatever reason/s, the quality of the code the C compiler produced improved dramatically after the update (11/15/2014). Run-time is now ~ 70 clocks, 17 nanoseconds to compose and compare an anchor ring with all 55 turns of a test ring and NxN of all rings against all others is done in 12.5 seconds.

This code is so tight all but 4 registers are sitting around doing nothing 99% of the time. The assembly language matches the C code almost line for line. Very easy to read and understand. A great assembly project if someone were teaching themselves that.

Hardware is Hazwell i7, MSVC 64-bit, full optimizations.

#include "stdafx.h"
#include "stdafx.h"
#include <string>
#include <memory>
#include <stdio.h>
#include <time.h>

const uint8_t  LIST_LENGTH = 55;    // uint_8 supports full witdth of SIMD and AVX2
// max left shifts is 32, so must use right shifts to create head_bit
const uint64_t head_bit = (0x8000000000000000 >> (64 - LIST_LENGTH)); 
const uint64_t CPU_FREQ = 3840000000;   // turbo-mode clock freq of my i7 chip

const uint64_t LOOP_KNT = 688275225; // 26235^2 // 1000000000;

// ----------------------------------------------------------------------------
__inline uint8_t is_circular_identical(const uint64_t anchor_ring, uint64_t test_ring)
{
    // By trial and error, try to synch 2 circular lists by holding one constant
    //   and turning the other 0 to LIST_LENGTH positions. Return compare count.

    // Return the number of tries which aligned the circularly identical rings, 
    //  where any non-zero value is treated as a bool TRUE. Return a zero/FALSE,
    //  if all tries failed to find a sequence match. 
    // If anchor_ring and test_ring are equal to start with, return one.

    for (uint8_t i = LIST_LENGTH; i;  i--)
    {
        // This function could be made bool, returning TRUE or FALSE, but
        // as a debugging tool, knowing the try_knt that got a match is nice.
        if (anchor_ring == test_ring) {  // test all 55 list members simultaneously
            return (LIST_LENGTH +1) - i;
        }

        if (test_ring % 2) {    //  ring's tail is 1 ?
            test_ring /= 2;     //  right-shift 1 bit
            // if the ring tail was 1, set head to 1 to simulate wrapping
            test_ring += head_bit;      
        }   else    {           // ring's tail must be 0
            test_ring /= 2;     // right-shift 1 bit
            // if the ring tail was 0, doing nothing leaves head a 0
        }
    }
    // if we got here, they can't be circularly identical
    return 0;
}
// ----------------------------------------------------------------------------
    int main(void)  {
        time_t start = clock();
        uint64_t anchor, test_ring, i,  milliseconds;
        uint8_t try_knt;

        anchor = 31525197391593472; // bits 55,54,53 set true, all others false
        // Anchor right-shifted LIST_LENGTH/2 represents the average search turns
        test_ring = anchor >> (1 + (LIST_LENGTH / 2)); //  117440512; 

        printf("\n\nRunning benchmarks for %llu loops.", LOOP_KNT);
        start = clock();
        for (i = LOOP_KNT; i; i--)  {
            try_knt = is_circular_identical(anchor, test_ring);
            // The shifting of test_ring below is a test fixture to prevent the 
            //  optimizer from optimizing the loop away and returning instantly
            if (i % 2) {
                test_ring /= 2;
            }   else  {
                test_ring *= 2;
            }
        }
        milliseconds = (uint64_t)(clock() - start);
        printf("\nET for is_circular_identical was %f milliseconds."
                "\n\tLast try_knt was %u for test_ring list %llu", 
                        (double)milliseconds, try_knt, test_ring);

        printf("\nConsuming %7.1f clocks per list.\n",
                (double)((milliseconds * (CPU_FREQ / 1000)) / (uint64_t)LOOP_KNT));

        getchar();
        return 0;
}

enter image description here

Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
  • 24
    people keep talking about "salvador dali's solution" and i was just sitting here confused, wondering if the painter of the same name was also a mathematician who contributed to classical algorithms in some significant way. then i realized that's the username of the person who posted the most popular answer. i am not a smart man. – Woodrow Barlow Nov 14 '14 at 20:26
  • For anyone with 10k rep, and implementation is [available here](http://stackoverflow.com/revisions/26936218/2) using Numpy and vectorization. [Gist mirror for those <10k](https://gist.github.com/Veedrac/f6556c24563db2f5b9ff). I deleted my answer because [David Eisenstat's answer](http://stackoverflow.com/a/26933996/1763356) points out that you don't need to do comparisons *at all* as you can just generate the unique lists straight away and I want to encourage people to use his far better answer. – Veedrac Nov 14 '14 at 21:11
  • @RocketRoy Why do you think Python wouldn't have bit operations? Heck, I use bit operations [in the code I linked](https://gist.github.com/Veedrac/f6556c24563db2f5b9ff). I still think this answer is mostly unneeded (David Eisenstat's answer takes 1ms for the whole thing), but I found that statement strange. FWIW, a similar algorithm in Numpy for searching 262M-"lists" takes about 15s on my computer (assuming no match is found), only the rotating of the list happens in the outer loop, not the inner one. – Veedrac Nov 15 '14 at 05:55
  • @Quincunx, thank you for your edit to get the syntax coloring correct for C++. Greatly appreciated! –  Nov 17 '14 at 00:22
  • @RocketRoy No problem. When you answer a lot of questions over on [PPCG](http://codegolf.stackexchange.com/), you learn how to do the syntax coloring. – Justin Nov 17 '14 at 00:47
  • @Veedrac, because interpreted languages have a lot of interpretation overhead. If you amortize that cost over a lot of work, as the APL language does because it mostly operates on large collections of data, like multidimensional arrays, the overhead is small. As you get closer and closer to the bare metal though that cost grows dramatically. When doing bit ops in Python it probably takes 10X as long for interpretation as for the bit ops, which in this code are paired with other instructions so they execute in ~ 1/3rd of a clock tic. –  Jan 28 '16 at 21:22
  • @RocketRoy I don't understand what I've said that you're responding to. – Veedrac Jan 28 '16 at 21:51
34

Reading between the lines, it sounds as though you're trying to enumerate one representative of each circular equivalence class of strings with 3 ones and 52 zeros. Let's switch from a dense representation to a sparse one (set of three numbers in range(55)). In this representation, the circular shift of s by k is given by the comprehension set((i + k) % 55 for i in s). The lexicographic minimum representative in a class always contains the position 0. Given a set of the form {0, i, j} with 0 < i < j, the other candidates for minimum in the class are {0, j - i, 55 - i} and {0, 55 - j, 55 + i - j}. Hence, we need (i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j)) for the original to be minimum. Here's some enumeration code.

def makereps():
    reps = []
    for i in range(1, 55 - 1):
        for j in range(i + 1, 55):
            if (i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j)):
                reps.append('1' + '0' * (i - 1) + '1' + '0' * (j - i - 1) + '1' + '0' * (55 - j - 1))
    return reps
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • FWIW, I suggest you use `[1] + [0] * (i - 1) + [1] + ...` to actually generate a list. – Veedrac Nov 14 '14 at 21:14
  • It is worth noting that this takes O(n^2) time and O(n^2) space to generate all possible cyclic rotations. Also it does not matter for l=55, but for l0^4 it will run ~20 secs and generate huge amount of RAM. Also it is worth to tell that we just generated all the rotations and we need to check whether our rotation is in these rotations. To compare it with my answer, it takes less than 0.1 sec to generate answer for l=10^6 and o(n) space complexity. – Salvador Dali Nov 14 '14 at 23:14
  • 2
    @SalvadorDali You've misunderstood the answer (I did so too until he pointed it out!). This directly generates "one representative of each circular equivalence class of strings with 3 ones and 52 zeros". His code doesn't generate all cyclical rotations. The original cost¹ is T(55²·26235²). Your code improves the 55² to 55, so is just T(55*26235²). David Eisenstat's answer is between 55² and 55³ *for the whole thing*. 55³ ≪ 55·26235². ¹Not talking big-O terms here as the actual cost in O(1) in all cases. – Veedrac Nov 14 '14 at 23:55
  • @Veedrac you are right, it looks like I missed the fact that space complexity is not O(n^2), but who knows may in the worse case it will be O(n^2). Yes, in time complexity it is at least O(n^2), but my solution is O(n). Why is my code improves it to 55*some big number? OP wants to find if one list is a cyclic representation of another list. My code finds it in O(n) in terms of the size of the list. – Salvador Dali Nov 15 '14 at 00:03
  • @SalvadorDali I'll quote the important part of the question: *"The problem is, each list which I have has a length of 55 and contains only three ones and 52 zeros in it. Without circular condition, there are 26,235 (55 choose 3) lists. However, if the condition 'circular' exists, there are a huge number of circularly identical lists."* David Eisenstat's answer avoids generating any of these circularly identical lists. Your answer tries to filter the 26k lists by pairwise comparison (O(n²·cost of comparison) worst case; slightly less in practice). – Veedrac Nov 15 '14 at 00:04
  • @Veedrac who knows, may be you are correct. Taking into consideration that the question is pretty ambiguous and in the last 16 hours OP has not even tried to interact with people who gave him solution I am not really sure anyone can tell what exactly he wants. I gave an good algorithm to check that two lists are cyclic (based on his title that's what he needs). Yes it does not takes into account his L=55 and also 3 ones. So may be for his exact purpose he will find someone's else answer better and he clearly can accept it. – Salvador Dali Nov 15 '14 at 00:17
  • 1
    @Veedrac But 99% of readers who will come to this question in the future, will not have his constrains and I believe my answer will suit them better. Without bloating the conversation further I will leave to the OP to explain what exactly does he want. – Salvador Dali Nov 15 '14 at 00:18
  • 5
    @SalvadorDali OP appears to have fallen prey to the [XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Fortunately, the question itself makes clear what the title does not, and David was able to read between the lines. If this is in fact the case, then the right thing to do is to alter the title and solve the actual problem, rather than to answer the title and ignore the question. – Aaron Dufour Nov 15 '14 at 01:06
  • 1
    @SalvadorDali, under the covers your Python code is calling the equivalent of C's strstr() which searches a string for a sub-string. That in turn calls strcmp(), which runs a for() loop comparing each char in a string1 with string2. Therefore, what looks like O(n) is O(n*55*55) assuming a search to failure. High level languages are a 2-edged sword. They hide implementation details from you, but then they also hide implementation details from you. FWIW, your insight to concatenate the list was brilliant. Faster still as uint8, and much faster as bits - which can be easily rotated in hardware. –  Nov 19 '14 at 03:28
  • Can be simplified to: `for i in range(18,52+1): for j in range(0,min(i,52-i)+1): if i != 52-i-j: reps.append([1] + [0]*i + [1] + [0]*j + [1] + [0]*(52-i-j))` – Aleksandr Dubinsky Nov 19 '14 at 16:13
  • 2
    @AleksandrDubinsky Simpler for the computer, more complicated for human beings. It's fast enough as is. – David Eisenstat Nov 19 '14 at 16:22
12

Repeat the first array, then use the Z algorithm (O(n) time) to find the second array inside the first.

(Note: you don't have to physically copy the first array. You can just wrap around during matching.)

The nice thing about the Z algorithm is that it's very simple compared to KMP, BM, etc.
However, if you're feeling ambitious, you could do string matching in linear time and constant space -- strstr, for example, does this. Implementing it would be more painful, though.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • And this is the problem with external links. The z-algorithm PDF is no longer available and is not available in archive.org either. – Eric D Jan 29 '22 at 16:39
  • 1
    @EricD: https://web.archive.org/web/20170628183833if_/http://www.cs.umd.edu/class/fall2011/cmsc858s/Lec02-zalg.pdf – user541686 Jan 29 '22 at 19:20
6

Following up on Salvador Dali's very smart solution, the best way to handle it is to make sure all elements are of the same length, as well as both LISTS are of the same length.

def is_circular_equal(lst1, lst2):
    if len(lst1) != len(lst2):
        return False
    lst1, lst2 = map(str, lst1), map(str, lst2)
    len_longest_element = max(map(len, lst1))
    template = "{{:{}}}".format(len_longest_element)
    circ_lst = " ".join([template.format(el) for el in lst1]) * 2
    return " ".join([template.format(el) for el in lst2]) in circ_lst

No clue if this is faster or slower than AshwiniChaudhary's recommended regex solution in Salvador Dali's answer, which reads:

import re

def is_circular_equal(lst1, lst2):
    if len(lst2) != len(lst2):
        return False
    return bool(re.search(r"\b{}\b".format(' '.join(map(str, lst2))),
                          ' '.join(map(str, lst1)) * 2))
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • 1
    wiki'd this since I basically just tweaked Salvador Dali's answer and formatted Ashwini's changes. Very little of this is actually mine. – Adam Smith Nov 14 '14 at 07:38
  • 1
    thank you for the input. I think I covered all the possible cases in my edited solution. Let me know if something is missing. – Salvador Dali Nov 14 '14 at 07:42
  • @SalvadorDali ah, yes...checking that the strings are the same length. I SUPPOSE that would be easier than running through the list looking for the longest element, then calling `str.format` `n` times to format the resulting string. I SUPPOSE.... :) – Adam Smith Nov 14 '14 at 07:44
3

Given that you need to do so many comparisons might it be worth your while taking an initial pass through your lists to convert them into some sort of canonical form that can be easily compared?

Are you trying to get a set of circularly-unique lists? If so you can throw them into a set after converting to tuples.

def normalise(lst):
    # Pick the 'maximum' out of all cyclic options
    return max([lst[i:]+lst[:i] for i in range(len(lst))])

a_normalised = map(normalise,a)
a_tuples = map(tuple,a_normalised)
a_unique = set(a_tuples)

Apologies to David Eisenstat for not spotting his v.similar answer.

3

You can roll one list like this:

list1, list2 = [0,1,1,1,0,0,1,0], [1,0,0,1,0,0,1,1]

str_list1="".join(map(str,list1))
str_list2="".join(map(str,list2))

def rotate(string_to_rotate, result=[]):
    result.append(string_to_rotate)
    for i in xrange(1,len(string_to_rotate)):
        result.append(result[-1][1:]+result[-1][0])
    return result

for x in rotate(str_list1):
    if cmp(x,str_list2)==0:
        print "lists are rotationally identical"
        break
Stefan Gruenwald
  • 2,582
  • 24
  • 30
3

First convert every of your list elements (in a copy if necessary) to that rotated version that is lexically greatest.

Then sort the resulting list of lists (retaining an index into the original list position) and unify the sorted list, marking all the duplicates in the original list as needed.

2

Piggybacking on @SalvadorDali's observation on looking for matches of a in any a-lengthed sized slice in b+b, here is a solution using just list operations.

def rollmatch(a,b):
    bb=b*2
    return any(not any(ax^bbx for ax,bbx in zip(a,bb[i:])) for i in range(len(a)))

l1 = [1,0,0,1]
l2 = [1,1,0,0]
l3 = [1,0,1,0]

rollmatch(l1,l2)  # True
rollmatch(l1,l3)  # False

2nd approach: [deleted]

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
1

Not a complete, free-standing answer, but on the topic of optimizing by reducing comparisons, I too was thinking of normalized representations.

Namely, if your input alphabet is {0, 1}, you could reduce the number of allowed permutations significantly. Rotate the first list to a (pseudo-) normalized form (given the distribution in your question, I would pick one where one of the 1 bits is on the extreme left, and one of the 0 bits is on the extreme right). Now before each comparison, successively rotate the other list through the possible positions with the same alignment pattern.

For example, if you have a total of four 1 bits, there can be at most 4 permutations with this alignment, and if you have clusters of adjacent 1 bits, each additional bit in such a cluster reduces the amount of positions.

List 1   1 1 1 0 1 0

List 2   1 0 1 1 1 0  1st permutation
         1 1 1 0 1 0  2nd permutation, final permutation, match, done

This generalizes to larger alphabets and different alignment patterns; the main challenge is to find a good normalization with only a few possible representations. Ideally, it would be a proper normalization, with a single unique representation, but given the problem, I don't think that's possible.

tripleee
  • 175,061
  • 34
  • 275
  • 318
0

Building further on RocketRoy's answer: Convert all your lists up front to unsigned 64 bit numbers. For each list, rotate those 55 bits around to find the smallest numerical value.

You are now left with a single unsigned 64 bit value for each list that you can compare straight with the value of the other lists. Function is_circular_identical() is not required anymore.

(In essence, you create an identity value for your lists that is not affected by the rotation of the lists elements) That would even work if you have an arbitrary number of one's in your lists.

Kris M
  • 16
  • 1
  • 3
0

This is the same idea of Salvador Dali but don't need the string convertion. Behind is the same KMP recover idea to avoid impossible shift inspection. Them only call KMPModified(list1, list2+list2).

    public class KmpModified
    {
        public int[] CalculatePhi(int[] pattern)
        {
            var phi = new int[pattern.Length + 1];
            phi[0] = -1;
            phi[1] = 0;

            int pos = 1, cnd = 0;
            while (pos < pattern.Length)
                if (pattern[pos] == pattern[cnd])
                {
                    cnd++;
                    phi[pos + 1] = cnd;
                    pos++;
                }
                else if (cnd > 0)
                    cnd = phi[cnd];
                else
                {
                    phi[pos + 1] = 0;
                    pos++;
                }

            return phi;
        }

        public IEnumerable<int> Search(int[] pattern, int[] list)
        {
            var phi = CalculatePhi(pattern);

            int m = 0, i = 0;
            while (m < list.Length)
                if (pattern[i] == list[m])
                {
                    i++;
                    if (i == pattern.Length)
                    {
                        yield return m - i + 1;
                        i = phi[i];
                    }
                    m++;
                }
                else if (i > 0)
                {
                    i = phi[i];
                }
                else
                {
                    i = 0;
                    m++;
                }
        }

        [Fact]
        public void BasicTest()
        {
            var pattern = new[] { 1, 1, 10 };
            var list = new[] {2, 4, 1, 1, 1, 10, 1, 5, 1, 1, 10, 9};
            var matches = Search(pattern, list).ToList();

            Assert.Equal(new[] {3, 8}, matches);
        }

        [Fact]
        public void SolveProblem()
        {
            var random = new Random();
            var list = new int[10];
            for (var k = 0; k < list.Length; k++)
                list[k]= random.Next();

            var rotation = new int[list.Length];
            for (var k = 1; k < list.Length; k++)
                rotation[k - 1] = list[k];
            rotation[rotation.Length - 1] = list[0];

            Assert.True(Search(list, rotation.Concat(rotation).ToArray()).Any());
        }
    }

Hope this help!

Miguel
  • 3,786
  • 2
  • 19
  • 32
0

Simplifying The Problem

  • The problem consist of list of ordered items
  • The domain of value is binary (0,1)
  • We can reduce the problem by mapping consecutive 1s into a count
  • and consecutive 0s into a negative count

Example

A = [ 1, 1, 1, 0, 0, 1, 1, 0 ]
B = [ 1, 1, 0, 1, 1, 1, 0, 0 ]
~
A = [ +3, -2, +2, -1 ]
B = [ +2, -1, +3, -2 ]
  • This process require that the first item and the last item must be different
  • This will reduce the amount of comparisons overall

Checking Process

  • If we assume that they're duplicate, then we can assume what we are looking for
  • Basically the first item from the first list must exist somewhere in the other list
  • Followed by what is followed in the first list, and in the same manner
  • The previous items should be the last items from the first list
  • Since it's circular, the order is the same

The Grip

  • The question here is where to start, technically known as lookup and look-ahead
  • We will just check where the first element of the first list exist through the second list
  • The probability of frequent element is lower given that we mapped the lists into histograms

Pseudo-Code

FUNCTION IS_DUPLICATE (LIST L1, LIST L2) : BOOLEAN

    LIST A = MAP_LIST(L1)
    LIST B = MAP_LIST(L2)

    LIST ALPHA = LOOKUP_INDEX(B, A[0])

    IF A.SIZE != B.SIZE
       OR COUNT_CHAR(A, 0) != COUNT_CHAR(B, ALPHA[0]) THEN

        RETURN FALSE

    END IF

    FOR EACH INDEX IN ALPHA

        IF ALPHA_NGRAM(A, B, INDEX, 1) THEN

            IF IS_DUPLICATE(A, B, INDEX) THEN

                RETURN TRUE

            END IF

        END IF

    END FOR

    RETURN FALSE

END FUNCTION

FUNCTION IS_DUPLICATE (LIST L1, LIST L2, INTEGER INDEX) : BOOLEAN

    INTEGER I = 0

    WHILE I < L1.SIZE DO

        IF L1[I] != L2[(INDEX+I)%L2.SIZE] THEN

            RETURN FALSE

        END IF

        I = I + 1

    END WHILE

    RETURN TRUE

END FUNCTION

Functions

  • MAP_LIST(LIST A):LIST MAP CONSQUETIVE ELEMENTS AS COUNTS IN A NEW LIST

  • LOOKUP_INDEX(LIST A, INTEGER E):LIST RETURN LIST OF INDICES WHERE THE ELEMENT E EXIST IN THE LIST A

  • COUNT_CHAR(LIST A , INTEGER E):INTEGER COUNT HOW MANY TIMES AN ELEMENT E OCCUR IN A LIST A

  • ALPHA_NGRAM(LIST A,LIST B,INTEGER I,INTEGER N):BOOLEAN CHECK IF B[I] IS EQUIVALENT TO A[0] N-GRAM IN BOTH DIRECTIONS


Finally

If the list size is going to be pretty huge or if the element we are starting to check the cycle from is frequently high, then we can do the following:

  • Look for the least-frequent item in the first list to start with

  • increase the n-gram N parameter to lower the probability of going through a the linear check

Khaled.K
  • 5,828
  • 1
  • 33
  • 51
0

An efficient, quick-to-compute "canonical form" for the lists in question can be derived as:

  • Count the number of zeroes between the ones (ignoring wrap-around), to get three numbers.
  • Rotate the three numbers so that the biggest number is first.
  • The first number (a) must be between 18 and 52 (inclusive). Re-encode it as between 0 and 34.
  • The second number (b) must be between 0 and 26, but it doesn't matter much.
  • Drop the third number, since it's just 52 - (a + b) and adds no information

The canonical form is the integer b * 35 + a, which is between 0 and 936 (inclusive), which is fairly compact (there are 477 circularly-unique lists in total).

Aleksandr Dubinsky
  • 22,436
  • 15
  • 82
  • 99
0

I wrote an straightforward solution which compares both lists and just increases (and wraps around) the index of the compared value for each iteration.

I don't know python well so I wrote it in Java, but it's really simple so it should be easy to adapt it to any other language.

By this you could also compare lists of other types.

public class Main {

    public static void main(String[] args){
        int[] a = {0,1,1,1,0};
        int[] b = {1,1,0,0,1};

        System.out.println(isCircularIdentical(a, b));
    }

    public static boolean isCircularIdentical(int[] a, int[]b){
        if(a.length != b.length){
            return false;
        }

        //The outer loop is for the increase of the index of the second list
        outer:
        for(int i = 0; i < a.length; i++){
            //Loop trough the list and compare each value to the according value of the second list
            for(int k = 0; k < a.length; k++){
                // I use modulo length to wrap around the index
                if(a[k] != b[(k + i) % a.length]){
                    //If the values do not match I continue and shift the index one further
                    continue outer;
                }
            }
            return true;
        }
        return false;
    }
}
das Keks
  • 3,723
  • 5
  • 35
  • 57
0

As others have mentioned, once you find the normalized rotation of a list, you can compare them.

Heres some working code that does this, Basic method is to find a normalized rotation for each list and compare:

  • Calculate a normalized rotation index on each list.
  • Loop over both lists with their offsets, comparing each item, returning if they mis-match.

Note that this method is it doesn't depend on numbers, you can pass in lists of strings (any values which can be compared).

Instead of doing a list-in-list search, we know we want the list to start with the minimum value - so we can loop over the minimum values, searching until we find which one has the lowest successive values, storing this for further comparisons until we have the best.

There are many opportunities to exit early when calculating the index, details on some optimizations.

  • Skip searching for the best minimum value when theres only one.
  • Skip searching minimum values when the previous is also a minimum value (it will never be a better match).
  • Skip searching when all values are the same.
  • Fail early when lists have different minimum values.
  • Use regular comparison when offsets match.
  • Adjust offsets to avoid wrapping the index values on one of the lists during comparison.

Note that in Python a list-in-list search may well be faster, however I was interested to find an efficient algorithm - which could be used in other languages too. Also, there is some advantage to avoiding to create new lists.

def normalize_rotation_index(ls, v_min_other=None):
    """ Return the index or -1 (when the minimum is above `v_min_other`) """

    if len(ls) <= 1:
        return 0

    def compare_rotations(i_a, i_b):
        """ Return True when i_a is smaller.
            Note: unless there are large duplicate sections of identical values,
            this loop will exit early on.
        """
        for offset in range(1, len(ls)):
            v_a = ls[(i_a + offset) % len(ls)]
            v_b = ls[(i_b + offset) % len(ls)]
            if v_a < v_b:
                return True
            elif v_a > v_b:
                return False
        return False

    v_min = ls[0]
    i_best_first = 0
    i_best_last = 0
    i_best_total = 1
    for i in range(1, len(ls)):
        v = ls[i]
        if v_min > v:
            v_min = v
            i_best_first = i
            i_best_last = i
            i_best_total = 1
        elif v_min == v:
            i_best_last = i
            i_best_total += 1

    # all values match
    if i_best_total == len(ls):
        return 0

    # exit early if we're not matching another lists minimum
    if v_min_other is not None:
        if v_min != v_min_other:
            return -1
    # simple case, only one minimum
    if i_best_first == i_best_last:
        return i_best_first

    # otherwise find the minimum with the lowest values compared to all others.
    # start looking after the first we've found
    i_best = i_best_first
    for i in range(i_best_first + 1, i_best_last + 1):
        if (ls[i] == v_min) and (ls[i - 1] != v_min):
            if compare_rotations(i, i_best):
                i_best = i

    return i_best


def compare_circular_lists(ls_a, ls_b):
    # sanity checks
    if len(ls_a) != len(ls_b):
        return False
    if len(ls_a) <= 1:
        return (ls_a == ls_b)

    index_a = normalize_rotation_index(ls_a)
    index_b = normalize_rotation_index(ls_b, ls_a[index_a])

    if index_b == -1:
        return False

    if index_a == index_b:
        return (ls_a == ls_b)

    # cancel out 'index_a'
    index_b = (index_b - index_a)
    if index_b < 0:
        index_b += len(ls_a)
    index_a = 0  # ignore it

    # compare rotated lists
    for i in range(len(ls_a)):
        if ls_a[i] != ls_b[(index_b + i) % len(ls_b)]:
            return False
    return True


assert(compare_circular_lists([0, 9, -1, 2, -1], [-1, 2, -1, 0, 9]) == True)
assert(compare_circular_lists([2, 9, -1, 0, -1], [-1, 2, -1, 0, 9]) == False)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["World", "Hello" "Circular"]) == True)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["Circular", "Hello" "World"]) == False)

See: this snippet for some more tests/examples.

ideasman42
  • 42,413
  • 44
  • 197
  • 320
0

You can check to see if a list A is equal to a cyclic shift of list B in expected O(N) time pretty easily.

I would use a polynomial hash function to compute the hash of list A, and every cyclic shift of list B. Where a shift of list B has the same hash as list A, I'd compare the actual elements to see if they are equal.

The reason this is fast is that with polynomial hash functions (which are extremely common!), you can calculate the hash of each cyclic shift from the previous one in constant time, so you can calculate hashes for all of the cyclic shifts in O(N) time.

It works like this:

Let's say B has N elements, then the the hash of B using prime P is:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb = Hb*P + B[i];
}

This is an optimized way to evaluate a polynomial in P, and is equivalent to:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb += B[i] * P^(N-1-i);  //^ is exponentiation, not XOR
}

Notice how every B[i] is multiplied by P^(N-1-i). If we shift B to the left by 1, then every every B[i] will be multiplied by an extra P, except the first one. Since multiplication distributes over addition, we can multiply all the components at once just by multiplying the whole hash, and then fix up the factor for the first element.

The hash of the left shift of B is just

Hb1 = Hb*P + B[0]*(1-(P^N))

The second left shift:

Hb2 = Hb1*P + B[1]*(1-(P^N))

and so on...

NOTE: all math above is performed modulo some machine word size, and you only have to calculate P^N once.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
-1

To glue to the most pythonic way to do it, use sets !

from sets import Set
a = Set ([1, 1, 1, 0, 0])
b = Set ([0, 1, 1, 1, 0]) 
c = Set ([1, 0, 0, 1, 1])
a==b
True
a==b==c
True
Louis
  • 2,854
  • 2
  • 19
  • 24