1

I have a list of integers which is being continuously modified in a loop and I need to tell if its content repeats itself after a certain amount of iterations to break the loop.

If it doesn't, the list will eventually modify to [] or the loop terminates when a certain limit of iterations is reached. My solution so far:

def modify(numlist, a, b):
    numlist = [(x * a) for x in numlist]
    for i in range((len(numlist) - 1), 0, -1):
        if numlist[i] >= b: numlist[i - 1] += numlist[i] // b
        numlist[i] = numlist[i] % b
    numlist[0] %= b
    while numlist[-1] == 0:
        numlist.pop(-1)
        if numlist == []: break 

numlist = [1, 2, 3, 4, 5]
listHistory = [numlist]
a, b = someValue, anotherValue

n = 0
while numlist != [] and n <= limit:
    modify(numlist, int(a), int(b))
    listHistory.append(numlist)
    if numlist in listHistory: break
    n += 1

limit can be very large (ca. 10**6 - 10**7) and checking the current numlist against all its previous versions becomes really slow.

Is there a more efficient way to do this or even a method to predetermine if the modification is periodic by the lists initial content and given a, b?

Marcel Tesch
  • 184
  • 3
  • 15
  • Does "continuously modified" mean that items are added to the end of the list, or the entire list can potentially be modified? – Erik Kaplun Mar 19 '14 at 23:27
  • The whole list gets modified, no items are added. – Marcel Tesch Mar 19 '14 at 23:29
  • 1
    For a simple speed-up, you could make `listHistory` a set of tuples. Also, don't call your own variable `list`. – jonrsharpe Mar 19 '14 at 23:34
  • You won't gain much, but, `a, b = int(someValue), int(anotherValue)` and then `modify(list, a, b)`. Also, use `xrange` instead of `range`? – fredtantini Mar 19 '14 at 23:37
  • It is more a mathematical question than programming. You have a nonlinear mapping between vectors and are trying to find a fixpoint...Another view would be by group theory. Then you want to get the one-element... I'll tell you when my thoughts end in s.th. useful here – MaSp Mar 19 '14 at 23:47
  • You can also delete from `listHistory` all the lists that are not with the same length anymore. – fredtantini Mar 19 '14 at 23:52
  • Is the limit necessary? It's simpler to solve the problem without the limit. – user2357112 Mar 20 '14 at 00:19

3 Answers3

1

OK, got something. If you look at the last element in your list, lets call it m. What happens to it it gets multiplied by a and then taken modulo b. It never gets mixed with any other element, so if a configuration of the list has to repeat itself the following must hold:

m*a^n=m modulo b
<==>a^n=1 modulo b
<  >a^(n+1)=a modulo b

This is a problem where you can make use of Fermats little theorem If a and b are coprimes, then

a^phi(b)=1 modulo b

where phi is Eulers totient function.

So this reduces the amount of list configurations which you have to store in your history drastically. You only have to store it every phi(b) steps.

I found an implementation of phi here: Computing Eulers Totient Function

UPDATE:

Ok, I found a quick solution if you were to do += list[i] % b instead of += list[i] // b. Otherwise you need b^4*phi(b) steps in the worst case

UPDATE2:

I rewrote the code in C (see below) to make it faster and implemented the "tortoise and the hare" algorithm proposed by @user2357112. This way i can check some million loops per second what should be way faster than the python implementation. I tried it for some different value combinations:

a  b    steps     b^4*phi(b)  (b/GCD(b,a))^4*phi(b/GCD(n,a))    (b/GCD(b,a))^4*phi(b/GCD(n,a))/steps
2  37   67469796  67469796    67469796                          1
3  37   33734898  67469796    67469796                          2
4  37   33734898  67469796    67469796                          2
5  37   67469796  67469796    67469796                          1
6  37   7496644   67469796    67469796                          9
7  37   16867449  67469796    67469796                          4
36 37   3748322   67469796    67469796                          18
2  36   39366     20155392    629856                            16
3  36   256       20155392    82944                             27648
4  36   19683     20155392    39366                             2
5  36   5038848   20155392    20155392                          4

So you see where this is going: The cycle length seems always to be a divisor of (b/GCD(b,a))^4*phi(b/GCD(n,a)), so the worst case is (b/GCD(b,a))^4*phi(b/GCD(n,a)) steps as suspected

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void modify(int *, int, int);
void printl(int * );
int main(int argc, const char*argv[])
{
  int l[5]={1,2,3,4,5};
  int lz[5]={1,2,3,4,5};
  int i=1,a,b,n;
  if (argc<4) {
    printf("Not enough arguments!!\n");
    exit(1);
  }
  a=atoi(argv[1]);
  b=atoi(argv[2]);
  n=atoi(argv[3]);
  modify(l,a,b);
  while (i<n) {
    modify(l,a,b);
    modify(l,a,b);
    modify(lz,a,b);
    i++;
    if (memcmp(l,lz,sizeof(l))==0) {
      printf("success!\n");
      break;
    }
    if (i%1000000==0) printf("Step %d.000.000\n",i/1000000);
  }
  printf("Final step:  %d\n",i);
  printl(l);
  printl(lz);
  return 0;

}
void  modify(int * li, int a, int b) {
  int i=0;
  while (i<=4) {
   li[i]*=a;
   i++;
  }
  i=4;
  while (i>=1) {
    if (li[i]>=b) {
      li[i-1]+=li[i]/b;
    }
    li[i]=li[i]%b;
    i--;

  }
  li[0]=li[0]%b;
}
void printl(int * li) {
  printf("l=(%d,%d,%d,%d,%d)\n",li[0],li[1],li[2],li[3],li[4]);
Community
  • 1
  • 1
MaSp
  • 291
  • 1
  • 6
  • 1
    If you're going to go through and actually do the modifications, you should be able to simply adapt the [tortoise and the hare](http://en.wikipedia.org/wiki/Cycle_detection#Tortoise_and_hare) cycle-detection algorithm instead of storing all those previous entries. – user2357112 Mar 20 '14 at 00:57
  • @user2357112:Nice one, didn't know that one before. For that to be most efficient we would need to know that the cycle contains our starting point. But we don't generally know the length of the cycle. Then we will have to scan to `2*offset+2*cyclelength` in the worst case what can be about twice as far as if we were to save all intermediate steps. So considering memory vs. computation time it sounds promising =) – MaSp Mar 20 '14 at 01:18
0

First please allow me to say, all lists are periodic, if you consider a large enough period.

That said, this might be a good place to use a bloom filter, EG: https://pypi.python.org/pypi/drs-bloom-filter/

Bloom filters are sets that can perform set membership tests pretty quickly, and can add things to the set without actually storing the data of the element. This means it's probabilistic, but you can adjust the probability. So you might use a bloom filter test for a quick check, and upon detecting a match using the bloom filter, confirm the result using your slow+deterministic algorithm.

Usage looks like:

In [1]: import bloom_filter_mod

In [2]: bloom_filter = bloom_filter_mod.Bloom_filter(1000000, 0.01)

In [3]: for i in range(10):
   ...:     bloom_filter.add(i)
   ...:

In [4]: for i in range(0, 20, 2):
   ...:     if i in bloom_filter:
   ...:         print('{} present'.format(i))
   ...:
0 present
2 present
4 present
6 present
8 present

The 1000000 is the maximum number of elements you want to store in the filter, and the 0.01 is the maximum probability of a false positive when full.

So you could "store" each subsequence in the filter, and quickly detect recurrences.

dstromberg
  • 6,954
  • 1
  • 26
  • 27
  • Only if the list elements have a finite amount possible values you can be sure it will reppear ;). In this case you are right because they are integers and bound by b – MaSp Mar 20 '14 at 00:05
  • Thank you for your answers! But surely not every list can be periodic if elements will be removed under certain conditions like here (trailing zeros). Consider i. e. `[2, 5]`, `a = 2` and `b = 10`, it will be `[]` after two iterations. – Marcel Tesch Mar 20 '14 at 00:19
  • Periodic in this context means that after the list is [], it stays [] with a period of 1 There is actually no reason to pop the elements which get 0, despite the fact that they consume memory. Then the list would end up being [0,0,0,0,0] ( but only if a^k divides b for some b) – MaSp Mar 20 '14 at 00:22
0

Your list (which you really ought to rename, by the way) stores a number mod b**something in base b. Every run of modify multiplies the number by a, then truncates zeros at the end of the representation.

Call the number originally represented by the list n, and call the original length of the list l. If this process terminates, it will do so at the first iteration k such that b**l divides n * a**k, which will only ever occur if and only if all prime factors of b**l / gcd(n, b**l) are factors of a. This is simple to determine:

def all_prime_factors_of_first_divide_second(a, b):
    while a != 1:
        factor = gcd(a, b)
        if factor == 1:
            return False
        while not a % factor:
            a //= factor
    return True
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Sure,this is the easy special case. But even if this does not hold the lists generated will be periodic – MaSp Mar 20 '14 at 00:25
  • @MaSp: Indeed. I'm not sure whether the most important problem is to determine whether the list eventually goes to the empty list (which this answer solves), or whether any repetition, empty list or not, is reached within the time limit. – user2357112 Mar 20 '14 at 00:33
  • I wonder how big `b` will be in tesch's scenario. It should probably not be bigger than 25 if he wants it to finish in 10^7. If it is then no solveable by your case, it gets really hard to find the answer to this problem efficiently – MaSp Mar 20 '14 at 00:42
  • `b` is potentially arbitrary but normally between 2 and 36 for the majority of cases. – Marcel Tesch Mar 20 '14 at 00:56