Ensuring that two (G)ADTs have the same underlying representation in (GHC) Haskell

Question

In Haskell, sometimes for performance people will use unsafeCoerce (or the safer coerce) to translate between types that have the same internal representation. The most common example of this that I know of is for lists of newtypes:

newtype Identity a = Identity a

f :: [Identity a] -> [a]
f = coerce

Now, I have two GADTs in a codebase I'm working on that look something like this pared down:

data Typ where
    PredT :: Typ
    ProcT :: [Typ] -> Typ
    IntT :: Typ
    ListT :: Typ -> Typ


data HKTyp v (f :: * -> * -> *) where
    HKPredT :: HKTyp v f
    HKProcT :: [HKTyp v f] -> HKTyp v f
    HKIntT :: HKTyp v f
    HKListT :: f v (HKTyp v f) -> HKTyp v f

I need these types to be different (rather than using the later as a generalization of the former), because the singletons library (or at least the template haskell functions) doesn't like higher-kinded data. Now, because I have to keep these types separate, I want to have some conversion functions between them:

newtype Id v a = Id a

promoteHK :: Typ -> HKTyp v Id
promoteHK PredT = HKPredT
promoteHK (ProcT ts) = HKProcT (fmap promoteHK ts)
promoteHK IntT = HKIntT
promoteHK (ListT x) = HKListT (Id $ promoteHK x)

demoteHK :: HKTyp v Id -> Typ
demoteHK HKPredT = PredT
demoteHK (HKProcT (Id ts)) = ProcT (fmap demoteHK ts)
demoteHK HKIntT = IntT
demoteHK (HKListT (Id x)) = HKListT x

These are pretty mechanical to write, but that's not the issue.

While I'm sure in many cases, GHC could inline and beta-reduce applications of demoteHK and promoteHK at compile time, thus not causing any runtime costs for doing these conversions, I really want to be able to write

f :: [Typ] -> [HKTyp v Id]
f = coerce

to avoid traversing the data structure, since these types are so similar, and hence (I assume) should have the same underlying representation in memory.

My question is two-fold.

Do these types in fact have the same memory representation in GHC?
Are there strong guarantees in GHC for how (G)ADTs are laid out in memory that lets you do things like this in general?

I'd be tempted to try `unsafeCoerce` between these two types and see if I can cause a crash. If I can, well, they have different runtime representation. Otherwise, no definitive conclusion can be reached. Not much, but better than nothing. I suspect that current GHC uses the same representation and the coercion is safe, but that the GHC devs do not want to guarantee that will always be the case forever (?) — chi, Feb 02 '21 at 14:33
@chi Looking at the documentation for #unsafeCoerce, that does seem to be the case. If this use-case was guaranteed by the GHC devs, I'd imagine it would be documented there. This seems like it's something that's pretty unlikely to change in the future (i.e. tagged unions use tags internally by order of constructors listed), so it might be worth making a GHC proposal for having this case included in the `Data.Coerce` system, so that `unsafeCoerce` doesn't have to be used. — Nathan BeDell, Feb 02 '21 at 14:40
To answer my question 1, testing this out on GHCI, these types do seem to have the same representation. — Nathan BeDell, Feb 02 '21 at 14:40
(This is a style comment, completely unrelated to your actual problem.) It's a little odd to carry the `v` around everywhere like that. Consider `data HKType' f = HkPredT' | HkProcT' [HkType' f] | HkIntT' | HkListT' (f (HkType' f))` instead; then where you used to write `HkType v f`, write `HkType' (f v)`. This can't always be done -- e.g. if `HkType` is used as an argument to something that expects that kind specifically -- but very often. — Daniel Wagner, Feb 02 '21 at 15:18
Perhaps your codebase versions are GADTs, but the `data` decls you show use GADT syntax but are not GADTs. That is, the constructor sigs all end `... -> Typ` resp `... -> HKTyp v f` (and there's no constraints). — AntC, Feb 02 '21 at 20:56
@AntC I was pondering whether to make that comment myself. Of course, in the end, every ADT is a GADT -- or else the G would be unjustified! — Daniel Wagner, Feb 04 '21 at 20:01
The answer to "Are there strong guarantees.." is almost certainly no. The question you must ask yourself, how strong of a guarantee do you want? or does it just have to work in practice? I've asked a similair [question](https://stackoverflow.com/questions/36054141/special-runtime-representation-of-type) myself (mine was more along the lines of, I've assumed this guarantee is true but it now seems to be violated). I think the situation has not changed much since then. Probably the core problem is that GADTs can express very complex constraints in their type, ... — user2407038, Feb 04 '21 at 20:12
... and there a general process for determining whether two GADTs are the same is much, much harder than the same problem for two ADTs. — user2407038, Feb 04 '21 at 20:12
It’s definitely not guaranteed, but I think they happen to be the same, provided `f` is instantiated to a type with the right representation, coerced values are in normal form, and the number of constructors fits in the pointer tag bits (≤7 on 64-bit, ≤3 on 32). Probably best to write the mechanical code now and await a better `Coercible`, but you could try using encodings that make coercion easier and/or less necessary, like Böhm–Berarducci: `newtype T = T (forall r. r -> ([r] -> r) -> r -> (r -> r) -> r); newtype H v f = H (forall r. r v f -> ([r v f] -> r v f) -> (f v r -> r v f) -> r v f)` — Jon Purdy, Feb 04 '21 at 22:42
@AntC You're right. And even in my code base they are not currently GADTs proper. I think I wrote the type with this syntax in anticipation of potentially needing them in the future, e.x. to add in higher-kinded types into the mix. — Nathan BeDell, Feb 06 '21 at 14:25
So ListT would become `ListT :: Typ (TypeK `KindArrow` TypeK). — Nathan BeDell, Feb 06 '21 at 14:27

DDub · Answer 1 · 2021-02-04T19:48:22.277

I haven't tested the performance of the following, and it may be convoluted enough that GHC, in fact, can't optimize it, but it will let us build a better tool. The idea is to use Generics.

The plan is to define a type class that coerces two types that have the same Generic structure along with a function that makes use of the class. Consider the following:

{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics

class GenericCoerce a b where
  genericCoerce' :: a x -> b x

genericCoerce :: (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => x -> y
genericCoerce = to . genericCoerce' . from

Of course, we still need to define what makes two Reps coercible, but just like your promoteHK and demoteHK definitions, this is somewhat mechanical:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE EmptyCase #-}

instance GenericCoerce V1 V1 where
  genericCoerce' = \case

instance GenericCoerce U1 U1 where
  genericCoerce' = id

instance (GenericCoerce f f', GenericCoerce g g') => GenericCoerce (f :+: g) (f' :+: g') where
  genericCoerce' (L1 x) = L1 (genericCoerce' x)
  genericCoerce' (R1 x) = R1 (genericCoerce' x)

instance (GenericCoerce f f', GenericCoerce g g') => GenericCoerce (f :*: g) (f' :*: g') where
  genericCoerce' (x :*: y) = genericCoerce' x :*: genericCoerce' y

instance GenericCoerce cs1 cs2 => GenericCoerce (M1 t m cs1) (M1 t m' cs2) where
  genericCoerce' (M1 x) = M1 (genericCoerce' x)

instance (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => GenericCoerce (K1 t x) (K1 t y) where
  genericCoerce' (K1 x) = K1 (genericCoerce x)

This actually works for very basic cases! Consider a data type like:

data Foo = Bar | Baz
  deriving (Generic, Show)

We get the behavior we want:

> genericCoerce @Bool @Foo True
Baz

> genericCoerce @Foo @Bool Bar
False

However, the problem with this Generic way of doing coercion is that it doesn't exactly play nicely with the normal way of coercing data types. Specifically, the type rep of a given type and the type rep of that type wrapped in a newtype wrapper are not the same.

One possible solution is the use of (gasp) incoherent instances. This may be a line too far for you, but if you're okay with them, consider the following two additional instances:

-- instances to handle newtype constructor
instance {-# INCOHERENT #-} (Generic x, Rep x ~ D1 m x', GenericCoerce x' y) => GenericCoerce (C1 m2 (S1 m3 (Rec0 x))) y where
  genericCoerce' = genericCoerce' . unM1 . from . unK1 . unM1 . unM1

instance {-# INCOHERENT #-} (Generic y, Rep y ~ D1 m y', GenericCoerce x y') => GenericCoerce x (C1 m2 (S1 m3 (Rec0 y))) where
  genericCoerce' = M1 . M1 . K1 . to . M1 . genericCoerce'

These two instances specifically target the case where one of the types has a newtype wrapper. Incoherent instances are considered dangerous, and if you have a lot of nested types/newtypes in your coercion, perhaps something could go wrong. That said, with these two instance in play, you're good to go with the example you gave:

promoteHK :: Typ -> HKTyp v Id
promoteHK = genericCoerce

demoteHK :: HKTyp v Id -> Typ
demoteHK = genericCoerce

In action:

> promoteHK PredT
HKPredT

> promoteHK (ListT PredT)
HKListT (Id HKPredT)

> promoteHK (ListT (ListT (ListT PredT)))
HKListT (Id (HKListT (Id (HKListT (Id HKPredT)))))

> demoteHK (HKProcT [HKIntT, HKPredT])
ProcT [IntT,PredT]

So far, I haven't quite answered your question. You asked if two seemingly isomorphic types really do have the same memory representation in GHC and if there are any guarantees in GHC that let you do things like this in general (I assume by "things like this", you mean coercions between isomorphic data types).

As far as I'm aware, there are no guarantees from GHC, but genericCoerce has given us a slightly firmer ground to walk on. Excluding the incoherent instances hack, the original version of genericCoerce converts a data type with phantom type parameters into the same datatype just with different phantom parameters. Technically, I can provide no guarantee that GHC will store multiple instances of the same runtime data the same way, but it seems to me like a pretty easy assumption to make.

Once we add in the incoherent instances and the newtype wrapper shenanigans, we're standing on less solid grounds, but the fact that it all works is some consolation.

Indeed, now that we see that the core of genericCoerce really is a coercion (we're building the same data type up from what was just destructed in every case), and if we trust that the newtype-wrapper incoherent instances also function as a coercion, then we can write:

genericCoerceProbablySafe :: (Generic x, Generic y, GenericCoerce (Rep x) (Rep y)) => x -> y
genericCoerceProbablySafe = unsafeCoerce

We get better performance than genericCoerce and more type safety than unsafeCoerce, and we've reduced your questions down to: "Is the Generic Rep an accurate proxy for how GHC stores memory (up to newtype wrappers)?"

You say this addresses the "criteria of concise definitions". But the OP never listed that as a criterion! The only criteria listed are performance and correctness. The correct variant at the beginning is certainly going to be no better than the hand-written conversion in terms of performance (and almost certainly worse, given all the indirection involved). And sure, your `unsafeCoerce` implementation at the end will have good performance, but you just assume it's correct outright, and OP explicitly asked whether that assumption was right. — Daniel Wagner, Feb 04 '21 at 18:06
I misread. The OP say "I really want to be able to write `f = coerce`" after showing a long definition, and I didn't realize the next few lines were a continuation of that sentence. As for `unsafeCoerce`, I make two explicit assumptions: 1) The `GenericCoerce` constraint will limit the function to only being called on types that have the same `Generic` structure, and 2) the `Generic` rep mimics the memory rep. Thus, I'm reducing OP's question from "do we have any guarantees about memory of GADTs" to "Can we assume that the `Generic` rep mimics memory?" I thought that was an improvement. — DDub, Feb 04 '21 at 19:12
@DanielWagner I've updated the answer to try to be clearer about the conclusion. It doesn't definitively answer the OP's questions, but it does provide alternate ways to think about them. Thanks for pushing me toward clarity and improvement. — DDub, Feb 04 '21 at 19:51

Ensuring that two (G)ADTs have the same underlying representation in (GHC) Haskell

1 Answers1

Linked