Detect period of unknown source

Question

How to detect repeating digits in an infinite sequence? I tried Floyd & Brent detection algorithm but come to nothing... I have a generator that yields numbers ranging from 0 to 9 (inclusive) and I have to recognize a period in it.

Example test case:

import itertools

# of course this is a fake one just to offer an example
def source():
    return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 1))

>>> gen = source()
>>> period(gen)
(1, 0, 1, 4, 8, 2, 1, 3, 3, 1)

Why did the methods you used come to nothing? Can you show the code? — Ben, Jun 26 '12 at 10:12
Your recursion (obviously) has to be of the form `x_{n+1} := f(x_n)` or else there won't be a good definition of a repetition! — Katriel, Jun 26 '12 at 10:13
@katrielalex: Did not know that. So my source is not a repetition, because values do not depend on previous one. — rubik, Jun 26 '12 at 10:20
@Ben: Because I didn't know how to apply it to my example. As said on [Wikipedia article](http://en.wikipedia.org/wiki/Cycle_detection) repetition must follow the rule katrielalex pointed out. But that does not apply to my example... — rubik, Jun 26 '12 at 10:20
@Lev Levitsky: Well, if you guarantee that I can strip the first and the last elements, yes. — rubik, Jun 26 '12 at 13:40
@rubik what do you mean 'strip'? It's the same sequence with a cyclic permutation. — Lev Levitsky, Jun 26 '12 at 13:43
@Lev Levitsky: Oh yes sorry. It's correct result as long as I can return to the right one (the one in the example). In this case I should shift it but that's ok. — rubik, Jun 26 '12 at 15:32
possible duplicate of [Finding a repeating sequence at the end of a sequence of numbers](http://stackoverflow.com/questions/10441715/finding-a-repeating-sequence-at-the-end-of-a-sequence-of-numbers) — senderle, Jul 09 '12 at 22:29
Should "python" tag be removed? I think this limits the scope of a rather language-agnostic question. — heltonbiker, May 06 '13 at 22:23

score 12 · Accepted Answer · edited Jun 20 '20 at 09:12

12

Empirical methods

Here's a fun take on the problem. The more general form of your question is this:

Given a repeating sequence of unknown length, determine the period of the signal.

The process to determine the repeating frequencies is known as the Fourier Transform. In your example case the signal is clean and discrete, but the following solution will work even with continuous noisy data! The FFT will try to duplicate the frequencies of the input signal by approximating them in the so-called "wave-space" or "Fourier-space". Basically a peak in this space corresponds to a repeating signal. The period of your signal is related to the longest wavelength that is peaked.

import itertools

# of course this is a fake one just to offer an example
def source():
    return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 2))

import pylab as plt
import numpy as np
import scipy as sp

# Generate some test data, i.e. our "observations" of the signal
N = 300
vals = source()
X = np.array([vals.next() for _ in xrange(N)])

# Compute the FFT
W    = np.fft.fft(X)
freq = np.fft.fftfreq(N,1)
 
# Look for the longest signal that is "loud"
threshold = 10**2
idx = np.where(abs(W)>threshold)[0][-1]

max_f = abs(freq[idx])
print "Period estimate: ", 1/max_f

This gives the correct answer for this case, 10 though if N didn't divide the cycles cleanly, you would get a close estimate. We can visualize this via:

plt.subplot(211)
plt.scatter([max_f,], [np.abs(W[idx]),], s=100,color='r')
plt.plot(freq[:N/2], abs(W[:N/2]))
plt.xlabel(r"$f$")

plt.subplot(212)
plt.plot(1.0/freq[:N/2], abs(W[:N/2]))
plt.scatter([1/max_f,], [np.abs(W[idx]),], s=100,color='r')
plt.xlabel(r"$1/f$")
plt.xlim(0,20)

plt.show()

enter image description here

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 26 '12 at 15:01

Hooked

84,485
43
192
261

Wow very nice I have to say. Thank you! I'll play a bit more with the code. – rubik Jun 26 '12 at 15:59
@rubik I'd be happy to answer any questions about the method if there is something that is unclear (I really enjoyed writing this one up). I'd like to point out that if your sequence is _exact and discrete_, like say the digits from a continued fraction, then use Evgeny's answer. If you need the problem to generalize, my answer using FFT's is a well-known technique in signal processing. The nice thing about a signal processing approach (in addition to the generalization) is the visual feedback you get as a heuristic confidence conformation. – Hooked Jun 26 '12 at 16:08
Very interesting! You guessed it, the sequence is from an arithmetic operation between continued fractions. So do you recommend the other answers? – rubik Jul 09 '12 at 09:20
@rubik The answer is, as always, maybe. What is unknown about the cont. fraction unknown? Is it possible that the real number approximation could be a strange beast like \pi (which has, to my knowledge, an infinite period)? Does your answer have to be exact? The other answers are _great_, but only if they meet the criteria you are looking for - we need more context! It might make sense to pick one of these answers and ask a new follow-up question with a targeted focus. If you do so - edit your post so readers can follow to the new question please. – Hooked Jul 09 '12 at 13:24
I had already asked a question on Mathematics, but I received no useful answer. You can read it [here](http://math.stackexchange.com/questions/90406/how-to-detect-when-continued-fractions-period-terminates). I'll post a follow-up as soon as I will be able. – rubik Jul 10 '12 at 06:35

score 4 · Answer 2 · edited May 23 '17 at 12:00

Evgeny Kluev's answer provides a way to get an answer that might be right.

Definition

Let's assume you have some sequence D that is a repeating sequence. That is there is some sequence d of length L such that: D_i = d_{i mod L}, where t_i is the ith element of sequence t that is numbered from 0. We will say sequence d generates D.

Theorem

Given a sequence D which you know is generated by some finite sequence t. Given some d it is impossible to decide in finite time whether it generates D.

Proof

Since we are only allowed a finite time we can only access a finite number of elements of D. Let us suppose we access the first F elements of D. We chose the first F because if we are only allowed to access a finite number, the set containing the indices of the elements we've accessed is finite and hence has a maximum. Let that maximum be M. We can then let F = M+1, which is still a finite number.

Let L be the length of d and that D_i = d_{i mod L} for i < F. There are two possibilities for D_F it is either the same as d_{F mod L} or it is not. In the former case d seems to work, but in the latter case it does not. We cannot know which case is true until we access D_F. This will however require accessing F+1 elements, but we are limited to F element accesses.

Hence, for any F we won't have enough information to decide whether d generates D and therefore it is impossible to know in finite time whether d generates D.

Conclusions

It is possible to know in finite time that a sequence d does not generate D, but this doesn't help you. Since you want to find a sequence d that does generate D, but this involves amongst other things being able to prove that some sequence generates D.

Unless you have more information about D this problem is unsolvable. One bit of information that will make this problem decidable is some upper bound on the length of the shortest d that generates D. If you know the function generating D only has a known amount of finite state you can calculate this upper bound.

Hence, my conclusion is that you cannot solve this problem unless you change the specification a bit.

Nice proof and of course you're right. But unfortunately I cannot change the specification, so I'll choose the answer that can give the best approximation and/or the highest probability to be correct. — rubik, Jun 26 '12 at 16:06

score 2 · Answer 3 · answered Jun 26 '12 at 15:37

I have no idea about proper algorithms to apply here, but my understanding also is that you can never know for sure that you've detected a period if you have consumed only a finite number of terms. Anyway, here's what I've come up with, this is a very naive implementation, more to educate from the comments than to provide a good solution (I guess).

def guess_period(source, minlen=1, maxlen=100, trials=100):
    for n in range(minlen, maxlen+1):
        p = [j for i, j in zip(range(n), source)]
        if all([j for i, j in zip(range(n), source)] == p
               for k in range(trials)):
            return tuple(p)
    return None

This one, however, "forgets" the initial order and returns a tuple that is a cyclic permutation of the actual period:

In [101]: guess_period(gen)
Out[101]: (0, 1, 4, 8, 2, 1, 3, 3, 1, 1)

To compensate for this, you'll need to keep track of the offset.

Evgeny Kluev · Answer 4 · 2012-06-26T12:59:25.740

Since your sequence is not of the form X_n+1 = f(X_n), Floyd's or Brent's algorithms are not directly applicable to your case. But they may be extended to do the task:

Use Floyd's or Brent's algorithm to find some repeating element of the sequence.
Find next sequence element with the same value. Distance between these elements is a supposed period (k).
Remember next k elements of the sequence
Find the next occurrence of this k-element subsequence.
If distance between subsequences is greater than k, update k and continue with the step 3.
Repeat step 4 several times to verify the result. If maximum length of the repeating sub-sequence is known a-priori, use appropriate number of repetitions. Otherwise use as much repetitions as possible, because each repetition increases the result's correctness.

If the sequence cycling starts from the first element, ignore step 1 and start from step 2 (find next sequence element equal to the first element).

If the sequence cycling does not start from the first element, it is possible that Floyd's or Brent's algorithm finds some repeating element of the sequence that does not belong to the cycle. So it is reasonable to limit the number of iterations in steps 2 and 4, and if this limit is exceeded, continue with the step 1.

I'd add that for this type of sequence that it's impossible to know for certain whether the cycle you predict is correct. You could have some sequence where `s...sy` repeats, where `s` and `y` represent different subsequences. There is no way to know for sure (even if you know there is a repeating subsequence) whether you have the actual subsequence that repeats in finite time. — JPvdMerwe, Jun 26 '12 at 12:06
@JPvdMerwe: Well, I can add a check at the end of this. For example, if we found `s` as repeating, I can then check that the following `k` elements (where `k` is the length of `s`) are equal to `s`. If they are, `s` is probably the period, otherwise we re-start the search. — rubik, Jun 26 '12 at 13:42
@rubik: The point is still that it is only "probably" the answer. You can never know for certain unless there is an upper bound on `k`. The only thing you can say for an `s` is that it is **not** the answer. I've added, or will soon, an answer that explains my proof. — JPvdMerwe, Jun 26 '12 at 13:54