How would I find an infinite loop in an array of pointers?

Question

I have an array of pointers (this is algorithmic, so don't go into language specifics). Most of the time, this array points to locations outside of the array, but it degrades to a point where every pointer in the array points to another pointer in the array. Eventually, these pointers form an infinite loop.

So on the assumption that the entire array consists of pointers to another location in the array and you start at the beginning, how could you find the length of the loop with the highest efficiency in both time and space? I believe the best time efficiency would be O(n), since you have to loop over the array, and the best space efficiency would be O(1), though I have no idea how that would be achieved.

Index:  0  1  2  3  4  5  6
Value: *3 *2 *5 *4 *1 *1 *D

D is data that was being pointed to before the loop began. In this example, the cycle is 1, 2, 5 and it repeats infinitely, but indices 0, 3, and 4 are not part of the cycle.

Wouldn't you just go through the pointers and when you hit an element that is the same element as the element your beginning pointer was pointing to, end searching and count the number of elements you went through? — masotann, Apr 18 '13 at 21:00
Do these points change or are they fixed since the beginning? — Mitch Bukaner, Apr 18 '13 at 21:01
Is there always one 'loop'? Could there be zero? Could there be more than one? If more than one, do you need to find all, or just one? — hatchet - done with SOverflow, Apr 18 '13 at 21:05
No @Calpis, because what if the first two or three elements aren't part of the loop, but the next two are? I'll add an example to my question. — NobleUplift, Apr 18 '13 at 21:11
@MitchBukaner We would be able to scan the array without worrying about changes. — NobleUplift, Apr 18 '13 at 21:13
@hatchet I only need to find the first infinite loop found from iterating from the beginning of the array. — NobleUplift, Apr 18 '13 at 21:16
I can see a simple solution that's O(n) time complexity, but O(n) for space. Since it requires marking visited elements, a solution that's O(1) for space doesn't immediately occur to me. — hatchet - done with SOverflow, Apr 18 '13 at 21:21

rici · Accepted Answer · 2013-04-19T04:55:46.583

5

This is an instance of the cycle-detection problem. An elegant O(n) time O(1) space solution was discovered by Robert W. Floyd in 1960; it's commonly known as the "Tortoise and Hare" algorithm because it consists of traversing the sequence with two pointers, one moving twice as fast as the other.

The idea is simple: the cycle must have a loop with length k, for some k. At each iteration, the hare moves two steps and the tortoise moves one, so the distance between them is one greater than it was in the previous iteration. Every k iterations, therefore, they are a multiple of k steps apart from each other, and once they are both in the cycle (which will happen once the tortoise arrives), if they are a multiple of k steps apart, they both point at the same element.

If all you need to know is the length of the cycle, you wait for the hare and the tortoise to reach the same spot; then you step along the cycle, counting steps until you get back to the same spot again. In the worst case, the total number of steps will be the length of the tail plus twice the length of the cycle, which must be less than twice the number of elements.

Note: The second paragraph was edited to possibly make the idea "more obvious", whatever that might mean. A formal proof is easy and so is an implementation, so I provided neither.

edited Apr 19 '13 at 04:55

answered Apr 18 '13 at 22:15

rici

234,347
28
237
341

I actually don't think the proof for Tortoise and Hare is as trivial as the wiki page makes it seem (trivial in the scope of math/CS, yes. In the scope of an intro class, no). I also don't see why your paragraph proves that they must point to the same element after `k` steps (or at least you need to write out the expressions to show it). I proved it to myself with something like `dist = n + (k - (n % k)) = yk` for some integer `y`. And the next step in the algorithm requires a similar short proof. – rliu Apr 19 '13 at 03:51
I can prove it visually without the formal proof too (unroll the path the hare has traveled and then roll it over and over again until the tortoise and hare meet); but the explanation you give really makes it seem like they start in the cycle (which isn't true) or something... I honestly don't get anything from it. As for the joke, I actually took a set theory course where the professor _actually_ proved it off the top of his head every time. You'd ask him any random set-theoretic question that had a well-known solution and he'd recite it to you instantly. Really ridiculous stuff. – rliu Apr 19 '13 at 04:14
Ah sure. But they don't need a multiple of `k` steps to intersect once the tortoise has a reached the loop. In fact it will always take less than `k` steps for them to intersect. So what is the significance of your claim that they intersect after a multiple of `k` steps? – rliu Apr 19 '13 at 05:27
I can't tell if you are trolling at this point. You made a claim that if the tail length is a multiple of k then they'll intersect once they both reach the loop. True. Not sure how it's related. Then you said something about "one-way deterministic paths" and how they both reach the cycle. Cool. Then you said, loosely, "well, you know, the rest of obvious to me, YMMV". None of that is equivalent to the claim you just made. Your condescending attitude towards formal proofs for "obvious things" is pretty silly, imo, and I don't think beneficial to OP (or anyone?) at all. – rliu Apr 19 '13 at 05:40
The wiki link more than suffices and it has a code example, thank you @rici. – NobleUplift Apr 19 '13 at 09:46

score 0 · Answer 2 · edited May 23 '17 at 12:28

The array becomes, as you describe it, a graph (more properly a forrest) where each vertex has out-degree of exactly one. The components of such a graph can only consist of chains that each possibly end in a single loop. That is, each component is either shaped like an O or like a 6. (I am assuming no pointers are null, but this is easy to deal with. You end up with 1-shaped components with no cycles at all.)

You can trace all these components by "visiting" and keeping track of where you've been with a "visited" hash or flags array.

Here's an algorithm.

Edit It just DFS of a forrest simplified for the case of one child per node, which eliminates the need for a stack (or recursion) because backtracking is not needed.

Let A[0..N-1] be the array of pointers.
Let V[0..N-1] be an array of boolean "visited" flags, initially false.
Let C[0..N-1] be an array if integer counts, initially zero.
Let S[0..N-1] be an array of "step counts" for each component trace.
longest = 0  // length of longest cycle
for i in 0..N-1, increment C[j] if A[i] points to A[j]
for each k such that C[k] = 0 
  // No "in edges", so must be the start of a 6-shaped component
  s = 0
  while V[k] is false
    V[k] = true
    S[k] = s
    s = s + 1
    k index of the array location that A[k] points to
  end
  // Loop found. Length is s - S[k]
  longest = max(longest, s - S[k])
end
// Rest of loops must be of the O variety
while there exists V[k] false
  Let k be such that V[k] is false.
  s = 0
  while V[k] is false
    V[k] = true
    s = s + 1
    k index of the array location that A[k] points to
  end
  // Found loop of length s
  longest = max(longest, s)
end

Space and execution time are both proportional to size of the input array A. You can get rid of the S array if you're willing to trace 6-shaped components twice.

Addition I fully agree that if it's not necessary to find the cycle of maximum size, then the ancient "two pointer" algorithm for finding cycles in a linked list is superior, since it requires only constant space.

What is the Big O for time and space? From a brief glimpse, it looks like O(n^2) on time and O(n^2) on space, which is highly inefficient. — NobleUplift, Apr 19 '13 at 09:41
It is not. Nested loops don't automatically mean quadratic behavior. As I said in the article, "Space and execution time are both proportional to size of the input array A". I guess this is what you're calling O(n) though you didn't say what n is. The algorithm is just DFS for a forrest simplified to the case of exactly one child per node, which means it doesn't need a stack because it never needs to backtrack. — Gene, Apr 19 '13 at 13:23

score 0 · Answer 3 · answered Apr 18 '13 at 21:25

Make a directed graph of the elements in the array where a node points to another node if the element of the node points to the element of the node its pointing to and for each node. keep track of the indegree of the node(number of pointers pointing to it.) While making your graph, if there is a node with indegree == 2, then that node is part of an infinite cycle.

The above fails if the first element is included in the infinite cycle, so before the algorithm starts, add 1 indegree to the first element to resolve this.

How would I find an infinite loop in an array of pointers?

3 Answers3

Linked