3

I have the following peano number written with GADTs:

type z = Z of z

type 'a s = Z | S of 'a

type _ t = Z : z t | S : 'n t -> 'n s t

module T = struct
  type nonrec 'a t = 'a t
end

type 'a nat = 'a t

type e = T : 'n nat -> e

The following function to decode a 'a nat (or 'a t) into the number it encoded, works:

let to_int : type n. n t -> int =
  let rec go : type n. int -> n t -> int =
   fun acc n -> match n with Z -> acc | S n -> go (acc + 1) n
  in
  fun x -> go 0 x

but if I try to rewrite it almost exactly the same this way:

let to_int2 (type a) (a: a nat) : int =
  let rec go (type a) (acc : int) (x : a nat) : int =
    match x with
    | Z -> acc
    | S v -> go (acc + 1) v
  in
  go 0 a

I get a scope error. What's the difference between the two functions?

138 |     | S v -> go (acc + 1) v
                                ^
Error: This expression has type $0 t but an expression was expected of type
         'a
       The type constructor $0 would escape its scope
David 天宇 Wong
  • 3,724
  • 4
  • 35
  • 47
  • What makes you think it's "almost exactly the same"? One is explicitly polymorphic, the other is not. That's quite a big difference. See https://v2.ocaml.org/manual/locallyabstract.html#p:polymorpic-locally-abstract – glennsl Aug 13 '22 at 10:11
  • I guess there's something fundamental that's happening here that I don't get, because I don't see the difference between the two implementations. But I know there's one (otherwise I wouldn't get the error) – David 天宇 Wong Aug 13 '22 at 10:15
  • oh I think I see it, `type x .` is different than `(type x)`. Weirdly in utop I do get the same function signature if I write `let thing (type a) (a:a) : unit = ()` and `let thing : 'a . 'a -> unit = fun a -> ()` – David 天宇 Wong Aug 13 '22 at 10:28
  • Might be because it only matters for polymorphic recursion, i.e. how the function sees itself. I'm not very familiar with how `utop` works. – glennsl Aug 13 '22 at 10:46
  • See also https://stackoverflow.com/questions/69144536/in-ocaml-what-is-the-difference-between-a-and-type-a-and-when-to-use-eac – glennsl Aug 13 '22 at 10:46
  • The accepted answer points to a post that explains that eventhough the two signatures are the same, they are typechecked differently! How can you know this without knowing deep detail of how the compiler works :( I fee betrayed – David 天宇 Wong Aug 13 '22 at 14:06
  • There is no need to know the compiler internals: the let-polymorphism restriction that forbids polymorphic recursion is part of the (fully formalized) core ML type system. Going further, the OCaml extension that allows for polymorphic recursion has its section in the OCaml manual. – octachron Aug 14 '22 at 07:58

1 Answers1

9

The root issue is polymorphic recursion, GADTs are a red herring here.

Without an explicit annotation, recursive functions are not polymorphic in their own definition. For instance, the following function has type int -> int

let rec id x =
   let _discard = lazy (id 0) in
   x;;

because id is not polymorphic in

   let _discard = lazy (id 0) in

and thus id 0 implies that the type of id is int -> 'a which leads to id having type int -> int. In order to define polymorphic recursive function, one need to add an explicit universally quantified annotation

let rec id : 'a. 'a -> 'a = fun x ->
  let _discard = lazy (id 0) in
  x

With this change, id recovers its expected 'a -> 'a type. This requirement does not change with GADTs. Simplifying your code

let rec to_int (type a) (x : a nat) : int =
    match x with
    | Z -> 0
    | S v ->  1 + to_int v

the annotation x: a nat implies that the function to_int only works with a nat, but you are applying to an incompatible type (and ones that lives in a too narrow scope but that is secondary).

Like in the non-GADT case, the solution is to add an explicit polymorphic annotation:

let rec to_int: 'a. 'a nat -> int = fun (type a) (x : a nat) ->
    match x with
    | Z -> 0
    | S v ->  1 + to_int v

Since the form 'a. 'a nat -> int = fun (type a) (x : a nat) -> is both a mouthful and quite often needed with recursive function on GADTs, there is a shortcut notation available:

let rec to_int: type a. a nat -> int = fun x ->
    match x with
    | Z -> 0
    | S v ->  1 + to_int v

For people not very familiar with GADTs, this form is the one to prefer whenever one write a GADT function. Indeed, not only this avoids the issue with polymorphic recursion, writing down the explicit type of a function before trying to implement it is generally a good idea with GADTs.

See also https://ocaml.org/manual/polymorphism.html#s:polymorphic-recursion , https://ocaml.org/manual/gadts-tutorial.html#s%3Agadts-recfun , and https://v2.ocaml.org/manual/locallyabstract.html#p:polymorpic-locally-abstract .

octachron
  • 17,178
  • 2
  • 16
  • 23