7

I'm trying to write a function to test whether a mutable list in Ocaml contains a cycle or not (that is, has a reference to itself and repeats continuously.

My list is defined as type 'a m_list = Nil | Cons of 'a * (('a m_list) ref).

So far, I have:

let is_cyclic list =
  let rec check xs = 
    match (!xs) with
     |Nil -> false
     |Cons(_,v) -> ((!v)==list)||check v in
  match list with
   |Nil -> false
   |Cons(_, x) -> check x ;;

but it's not quite right and I'm unsure how to proceed from here...thanks for any help!

anon
  • 93
  • 1
  • 4

3 Answers3

3

There is a cycle in the list as soon as two Cons cells (found at different depths in the list) are the same. Your example code only checks if the first Cons cell appears again down in the list. One way to check for cycles is to remember all the Cons cells you have visited going down the list, and to compare each new cell to all the previous ones.

I'm not going to write the entire function, but it may look like this:

let rec is_cyclic list already_visited =
  match list with
    Nil -> false
  | Cons(h, { contents = t }) -> 
    if List.memq list already_visited
    then (* list was traversed before *)
      ...
    else
      ...
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • And if you are not already familiar with List.memq: 'a -> 'a list -> bool, a good exercise is to write it yourself as a recursive function using `==`. – Pascal Cuoq Mar 29 '11 at 05:54
  • 1
    It looks like a O(n^2) function. I'd prefer to use a hashtable for lookup, or the function given by Victor, instead. – Laurent Apr 12 '11 at 17:31
  • @Laurent Actually, I need a hashtable working with physical equality in my project. When you're done writing it, can you publish it somewhere? – Pascal Cuoq Apr 12 '11 at 17:33
  • @Pascal: I was talking about the algorithm, there are already two other answers in linear time. My point is that Victor's solution seems better. However, as we only need to hash addresses, I imagine casting to an integer would be a solution (ok, it's Obj.magic, but it looks safe). Moving to F# could be another solution in some cases... – Laurent Apr 12 '11 at 17:57
  • Wait, I forgot about the GC, let's remove the Obj.magic solution. :) – Laurent Apr 12 '11 at 18:11
  • 1
    @Laurent Actually, I think there is a good chance I eventually get == hashtables with a bit of support from the OCaml runtime. It's just not high enough on my TODO list at the moment (or the OCaml implementors'). – Pascal Cuoq Apr 12 '11 at 18:22
  • Seems to fail if type of 'a is selfless, i.e. int since List.memq h already_visited will always return true. Any work around? – Secret Aug 23 '13 at 07:19
  • @Secret Good find! The list `already_visited` was always intended to be a list of `Cons` cells, not of `'a`. One very good reason not to make it a list of `'a` is that the list `let x = "x" in Cons(x, Cons x, Nil))` is not cyclic and should not be diagnosed as such. Another is, as you point out, that we know that `Cons` cells are allocated and `==` behaves on them as we expect. That will teach me to write non-testable function skeletons. – Pascal Cuoq Aug 23 '13 at 07:36
2

Pascal Cuoq's answer is the best one, but for the sake of anecdotal obscurity, you can also perform this check with constant memory (but at a higher computational cost) by keeping two cursors and advancing one of them twice as fast as the other, and stopping as soon as they are equal:

let rec is_cyclic a b =    
  match a with 
    | Nil -> false 
    | Cons (_, { contents = a }) ->
      match b with 
        | Nil | Cons (_, { contents = Nil }) -> false
        | Cons (_, { contents = Cons (_, {contents = b}) } ->
          a == b || is_cyclic a b 

let is_cyclic l = is_cyclic l l  
Community
  • 1
  • 1
Victor Nicollet
  • 24,361
  • 4
  • 58
  • 89
0

If the list is long, you may want to use a hash table instead of a list for storing the visited cells and performing the lookups in near-constant time.

Let me expand and modify Pascal's code:

let rec is_cyclic list already_visited =
  match list with
    Nil -> false
  | Cons(h, { contents = t }) -> 
    V.mem already_visited h ||
    is_cyclic t (V.add already_visited h)

The V module comes from the following functor application:

module V = Visits.Make (struct type t = int end)

and Visits is defined as follows:

(* visits.ml *)
struct
  module Make (X : sig type t end) :
  sig
    type elt
    type t
    val create : int -> t
    val mem : t -> elt -> bool
    val add : t -> elt -> unit
  end with type elt = X.t =
  struct
    module H = Hashtbl.Make (
      struct
        type t = X.t
        let equal = ( == )
        let hash = Hashtbl.hash
      end
    )

    type elt = X.t
    type t = unit H.t
    let create len = H.create len
    let mem tbl x = H.mem tbl x
    let add tbl x = H.add tbl x ()
  end
end

The implementation above is perfectly safe and future-proof but is not polymorphic unlike the list-based solution.

It is possible to write a polymorphic version that uses the infamous Obj module, which should not be used without knowing a number of things that are not officially documented. The use of Obj in the code below makes assumptions on the implementation of the Hashtbl module, which are unlikely to break in the future but you are being warned.

That said, it is polymorphic and therefore easy to use:

(* visits.mli *)
type 'a t
val create : int -> 'a t
val mem : 'a t -> 'a -> bool
val add : 'a t -> 'a -> unit

(* visits.ml *)
module H = Hashtbl.Make (
  struct
    type t = Obj.t
        (* Warning: using Obj is not pure OCaml. It makes assumptions
           on the current implementation of Hashtbl,
           which is unlikely to change in incompatible ways
           anytime soon. *)

    let equal = ( == )
    let hash = Hashtbl.hash
  end
)

type 'a t = unit H.t
let create len = H.create len
let mem : 'a t -> 'a -> bool = fun tbl x -> H.mem tbl (Obj.repr x)
let add : 'a t -> 'a -> unit = fun tbl x -> H.add tbl (Obj.repr x) ()
Martin Jambon
  • 4,629
  • 2
  • 22
  • 28
  • Generally, the two things one expects from a hash function are 1) to be compatible with the equality in use, but also 2) to do a good job sending non-equal value to different hashes. The pair `Hashtbl.hash, (==)` is not very good —in general— at 2), because of the collisions between structurally identical, physically different values that may happen in large numbers. But it's better than a colleague who did not mind a few objects having to be rehashed and implemented his hash function as `fun x -> Hashtbl.hash ((Obj.magic x):int)` (which does not do what it tries to do, do you see why?) – Pascal Cuoq Apr 12 '11 at 20:15
  • I don't understand your objection. In the worst case (all values structurally equal), it is roughly equivalent to using List.memq/List.assq, such as with `let rec x = 1 :: 1 :: 1 :: 1 :: 1 :: 1 :: 1 :: 1 :: x in x`. Perhaps a graph structure without any node identifier or label would be a realistic example of a worst case scenario. The point is that we cannot use pointer values for hashing because they are mutated by the GC. Now I don't understand how `fun x -> Hashtbl.hash ((Obj.magic x):int)` is any different than just `Hashtbl.hash`. Do you mean `fun x -> (Obj.magic x : int)` which clearly – Martin Jambon Apr 12 '11 at 23:48
  • Do you mean `fun x -> (Obj.magic x : int)` which clearly won't work? – Martin Jambon Apr 12 '11 at 23:55
  • "Now I don't understand how `fun x -> Hashtbl.hash ((Obj.magic x):int)` is any different than just `Hashtbl.hash`" This is the mistake that colleague made. He thought it would do something different. He ended up with `Hashtbl.hash`, which was OK except for the collision problem, but that was not what he had intended. – Pascal Cuoq Apr 13 '11 at 06:48
  • Following an insightful comment, I changed my function. – Pascal Cuoq Aug 23 '13 at 07:41