10

I was trying to solve this leetcode problem https://leetcode.com/problems/find-the-duplicate-number/ using my own implementation of the tortoise and hare algorithm which resulted in an infinite loop when given the following array of integers:

[3,1,3,4,2]

Only after tracing through my algorithm was I able to see that the slow and fast runners never take on the two duplicate values at the same time. Here is my algorithm in pseudocode:

initialize fast and slow runners to 0

while(true)

   move fast runner two indices forward
   move slow runner one index forward

   if arr[fast] == arr[slow] and fast != slow
      return arr[fast] // this is the duplicate

Now, I'm sure someone who is skilled in discrete mathematics would have been able to intuitively know that this approach would not have lead to the correct solution without first having to trace through an example like I had to do.

What inferences or observations could I have made that would have lead me to see that this algorithm was not going to work? I'd like to know how one could intuitively identity a flaw in this logic through a series of logical statements. In other words, what's the explanation for why the two runners will never find the duplicates in this example? I feel like it may have something to do with counting, but I do not have a very strong background in discrete.

And to clarify, I have looked at the correct implementation so I do know what the correct way to solve it is. I just thought that this way would have worked too similar to applying it to linked lists, where you'd move the fast runner two nodes up and the slow runner one node up. Thank you for your help.

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
krabinowitz
  • 357
  • 3
  • 11
  • 1
    It seems you were hoping that the algorithm would explore every combination of array indices. There are 5 indexes for the hare, and 5 for the tortoise, so a total of 25 possible combinations. The problem is that the hare catches the tortoise after only 5 moves, so they will explore exactly 5 of the 25 possible combinations, specifically {1,2}, {2,4}, {3,1}, {4,3}, {0,0}. None of those is the combination that you were looking for: {0,2}. Why does the hare catch the tortoise after 5 moves? The tortoise moves 5 positions, and the hare moved 10 positions, and both are back to the start. – user3386109 Oct 27 '20 at 20:55
  • Great explanation. I guess I just didn't realize that only 5 combinations would ever get explored. – krabinowitz Oct 28 '20 at 00:15
  • The book "Elements of Programming" (free pdf: http://elementsofprogramming.com/) contains a chapter on developing and determining the correctness of this algorithm. – Justin Meiners Jun 13 '22 at 23:06

2 Answers2

8

Floyd's tortoise algorithm works when you're detecting a cycle in a linked list. It relies on the fact that if both pointers are moving at a different pace, the gap between them will keep on increasing to a limit, after which it'll be reset if a cycle exists.
In this case, the algorithm does find a cycle, since both pointers converge to the index 0 after some iterations. However, you're not looking to detect a cycle here; you're trying to find a duplicate. That's why this gets stuck in infinite recursion: it is meant to detect a cycle (which it correctly does), but not detect duplicates in its basic implementation.

To clarify, here's a sample linked list created on your sample array.

3 -> 1 -> 3 -> 4 -> 2
'--<----<----<----<-'

If you run Floyd's algorithm, you find that the cycle will get detected at index 0, since both pointers will converge there. It works by checking if fast and slow point to the same location and not if they have the same values of nodes (fast==slow isn't the same as fast.value==slow.value).

You are attempting to check duplicates by comparing the value on the nodes, and checking if the nodes don't point to the same location. That is actually the flaw, since Floyd's algorithm works to check if both pointers point to the same location in order to detect a cycle.
You can read this simple, informative proof to improve your intuition as to why the pointers will converge.

Abhinav Mathur
  • 7,791
  • 3
  • 10
  • 24
  • 1
    I sort of follow that. But why can't each position in the array be treated like a node in a linked list? – krabinowitz Oct 27 '20 at 19:42
  • Thank you for the added detail. My thought process was that if the loop ran long enough, eventually one pointer would be at the 3 at index 0, and the other pointer would be at the 3 at index 2. However that never happens which results in the infinite loop. If one pointer is moving faster than the other, shouldn't that have happened eventually? – krabinowitz Oct 27 '20 at 20:22
  • 1
    Look at the gap between both pointers. It starts with 0, then 1, then 2 and so on. When it reaches 5, however, it'll get reset to zero, since the list length is five and the pointers converge. And that's the point of a cycle - it repeats/loops after some condition, without guaranteeing that all possible pairs will be checked for value rather than location. – Abhinav Mathur Oct 27 '20 at 20:27
  • Ah interesting! That's what I can't wrap my head around. The realization that the pointers reset and converge when the gap between them is equal to the length of the array. For some reason that's not obvious to me. – krabinowitz Oct 27 '20 at 20:35
-2

That' not a bad idea. Here is an implementation in Python:

class Solution:
    def findDuplicate(self, nums):
        slow, fast = 0, 0
        while True:
            slow = nums[nums[slow]]
            fast = nums[fast]
            if slow == fast:
                break

        fast = 0
        while True:
            slow, fast = nums[slow], nums[fast]
            if slow == fast:
                break
        return slow

We can also use Binary Search:

class Solution:
    def findDuplicate(self, nums):
        lo, hi = 0, len(nums) - 1
        mid = lo + (hi - lo) // 2
        while hi - lo > 1:
            count = 0
            for num in nums:
                if mid < num <= hi:
                    count += 1
            if count > hi - mid:
                lo = mid
            else:
                hi = mid
            mid = lo + (hi - lo) // 2
        return hi

In C++:

// The following block might slightly improve the execution time;
// Can be removed;
static const auto __optimize__ = []() {
    std::ios::sync_with_stdio(false);
    std::cin.tie(nullptr);
    std::cout.tie(nullptr);
    return 0;
}();


// Most of headers are already included;
// Can be removed;
#include <iostream>
#include <cstdint>
#include <vector>


static const struct Solution {
    static const int findDuplicate(
        const std::vector<int>& nums
    ) {
        int slow = 0;
        int fast = 0;

        while (true) {
            slow = nums[nums[slow]];
            fast = nums[fast];

            if (slow == fast) {
                break;
            }
        }

        fast = 0;

        while (slow != fast) {
            slow = nums[slow];
            fast = nums[fast];
        }

        return slow;
    }
};

Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    Binary search only works on sorted input. It should be obvious that the input from the question is not sorted. And if it was there's an even simpler algorithm. – Mark Ransom Jul 26 '21 at 03:02