1

Suppose that I have a int ref list and I want to make a copy of it. One option is the following:

let copy_ref_list : int ref list -> int ref list =
  List.map (fun x -> ref !x)

However, this losing sharing within the nodes. For example,

let v =
  let x = ref 0 in
  copy_ref_list [x; x; x] (* creates three copies of [x] *)

Is there an efficient way to implement a function that would produce only one copy of x above, so that the result is the same as:

let copy_of_v =
  let x = ref 0 in
  [x; x; x]

It seems like I can use a memoized copy function with a Hashtbl, but does that give the right behavior on references?

let copy_ref_list (x : int ref list) : int ref list =
  let memo = Hashtbl.create 17 in
  let copy x =
    try Hashtbl.find memo x
    with Not_found ->
      let y = ref !x in
      let () = Hashtbl.add memo x y in
      y
  in
  List.map copy
Gregory
  • 1,205
  • 10
  • 17

2 Answers2

4

Your copy_ref_list function fails to copy the sharing structure. It doesn't check for sharing but only whether the referenced values are equal. The built-in comparison function for Hashtbl is based on =, and two references are equal if their contents are equal. Hence the copy will in general have too much sharing (rather than not enough).

(Also the last line of your function should read List.map copy x.)

Here's an example of the failure:

# let wlist = [ref 3; ref 3];;
val wlist : int ref list = [{contents = 3}; {contents = 3}]
# let wlist_copy = copy_ref_list wlist;;
val wlist_copy : int ref list = [{contents = 3}; {contents = 3}]
# List.hd wlist_copy := 4;;
- : unit = ()
# wlist_copy;;
- : int ref list = [{contents = 4}; {contents = 4}]

As you can see, the copy of wlist has shared values even though they aren't shared in the original list:

# List.hd wlist := 5;;
- : unit = ()
# wlist;;
- : int ref list = [{contents = 5}; {contents = 3}]

I think it will work if you use the functorial interface to Hashtbl and base your equality test on physical equality ==. The behavior of == when applied to mutable values behaves as you would expect, i.e., it will detect sharing.

Jeffrey Scofield
  • 65,646
  • 2
  • 72
  • 108
2

With a ref containing a single mutable value as shown, it isn't possible to use a Hashtbl to optimize the preservation of sharing. You can't hash the address of a value because GC may change that address.

So in this case a (less performant) scan must be used to preserve sharing. The following example uses a list of pairs, (r1, r2), where r1 is the original ref in the list and r2 is the new ref. This updates list is scanned using List.find_opt to find r1 refs that occur more than once in the original list, and use the corresponding r2 in the new list.

let copy_ref_list (ls : int ref list) : int ref list =
  let rec copy updates = function
    | [] -> []
    | r :: rs -> (
      match List.find_opt (fun (r1, _) -> r == r1) updates with
      | Some (_, r2) -> r2 :: copy updates rs
      | None ->
          let r2 = ref !r in
          let updates' = (r, r2) :: updates in
          r2 :: copy updates' rs )
  in
  copy [] ls

Another approach is to define the mutable value as a field in a record along with a unique id field, and use a Hashtbl with a hash on the unique id. A ref would not be used in this case. For example:

type 'a val_with_id = {mutable value: 'a; unique_id: int}

let copy_val_with_id_list (ls : 'a val_with_id list) : 'a val_with_id list = ...

This is a more intrusive solution since it requires maintaining this record and assigning a unique id each time a value is created. It also uses more memory. However, it may provide better copying performance for large lists since a hash lookup is used instead of a list scan per copied element.

For details on creating such a Hashtbl and maintaining a unique id see: https://stackoverflow.com/a/10290155/2607027

Here is a minimal example of using a Hashtbl and val_with_id, using code roughly similar to the code posted in the question.

let next_id =
  let id_sequence = ref (-1) in
  fun () -> incr id_sequence ; !id_sequence

type 'a val_with_id = {mutable value: 'a; unique_id: int}

let with_id v = {value= v; unique_id= next_id ()}

let copy_val_with_id_list (ls : 'a val_with_id list) : 'a val_with_id list =
  let memo = Hashtbl.create 17 in
  let copy r =
    try Hashtbl.find memo r.unique_id with
    | Not_found ->
        let r' = with_id r.value in
        Hashtbl.add memo r.unique_id r' ;
        r'
  in
  List.map copy ls
greybird
  • 36
  • 5