data family instances: newtype, data

Question

With Haskell 98 decls, the whole datatype must be either newtype or data. But data families can have a mix of newtype instance, data instance. Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:

data Foo a = MkFoo1 Int a
           | MkFoo2 Bool a
           | newtype MkFoo3 a

I know I can't write the following, but why/what goes wrong?:

data family Bar a

newtype instance Bar (Maybe Int)  = MkBar1 (Maybe Int)
            -- MkBar1 :: (Maybe Int) -> Bar (Maybe Int), see below

newtype instance Bar [Char]  = MkBar2 [Char]

data instance Bar [Bool]  where
  MkBar3 :: Int -> Bool -> Bar [Bool]
  -- can't be a newtype because of the existential Int

-- we're OK up to here mixing newtypes and data/GADT

data instance Bar [a]  where
  MkBar4 :: Num a => a -> Bar [a]
  -- can't be a newtype because of the Num a =>

I can't write that because the instance head Bar [a] overlaps the two heads for MkBar2, MkBar3. Then I could fix that by moving those two constructor decls inside the where ... for Bar [a]. But then MkBar2 becomes a GADT (because it's result type is not Bar [a]), so can't be a newtype.

Then is being a newtype a property of the result type, rather than of the constructor? But consider the type inferred for newtype instance MkBar1 above. I can't write a top-level newtype with the same type

newtype Baz a  where
  MkBaz :: (Maybe Int) -> Baz (Maybe Int)

-- Error: A newtype constructor must have a return type of form T a1 ... an

Huh? MkBar1 is a newtype constructor whose type is not of that form.

If possible, please explain without diving into talk of roles: I've tried to understand them; it just makes my head hurt. Talk in terms of constructing and pattern-matching on those constructors.

What you’re saying doesn’t make sense. Different instances of a data family are still *different types*, so the compiler can still statically determine a value’s representation. The whole point of a `newtype` is to use the same runtime representation as some other type (that is, with no wrappers), but if there are other cases to the type, that clearly isn’t possible. — Alexis King, Sep 20 '18 at 04:28
"Different instances of a data family are still different types," Hmm? They're all of type `Bar a`. "The whole point of a newtype ..., but if there are other cases to the type, that clearly isn’t possible. " `Bar (Maybe Int)`, `Bar [Char]` are other cases of to the type, and `newtype` is possible -- until it gets tangled up with `Bar [a]`. — AntC, Sep 20 '18 at 05:32
If you have a type family `F`, then surely you would not say that `F Int` and `F Bool` must be the same type, hm? The point is that GHC tracks that difference statically. It does not statically track the difference between different constructors (which is often frustrating to novice Haskellers, who regularly ask questions like “how can I have a type that is `T` but restricted to exclude data constructor `C`,” to which the only answer is “define a new data type”). — Alexis King, Sep 20 '18 at 05:36
@AntC "They're all of type `Bar a`." is not precise. Try it yourself and see: make a data family with at least two instances, then write `foo :: Bar a` and try to define `foo` without using `undefined`. You'll find it's not possible! — Daniel Wagner, Sep 20 '18 at 13:12

score 1 · Answer 1 · answered Sep 20 '18 at 04:37

1

You can think of data families as stronger versions of (injective and open) type families. As you can see from this question I asked a while back, data families can almost be faked with injective type families.

Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:
data Foo a = MkFoo1 Int a
           | MkFoo2 Bool a
           | newtype MkFoo3 a

No. Being a newtype is definitively a property of the type constructor, not of the data constructor. A data family, quite like a type family, has two levels of type constructors:

the type constructor data/type family FamTyCon a
the type constructors for the instances data/type instance FamInstTyCon a = ...

Different instances of the same data/type family constructor are still fundamentally different types - they happen to be unified under one type constructor (and that type constructor is going to be injective and generative - see the linked question for more on that).

By the type family analogy, you wouldn't expect to be able to pattern match on something of type TyFam a, right? Because you could have type instance TyFam Int = Bool and type instance TyFam () = Int, and you wouldn't be able to statically know if you were looking at a Bool or an Int!

answered Sep 20 '18 at 04:37

Alec

31,829
7
67
114

"You can think of data families as ..." you're explaining a bunch of tricky ideas in terms of two more even trickier ideas. I've deliberately not read past your first sentence, because my experience with an earlier q was I ended up even more confused. I like thinking in terms of data because I can imagine it in store. Whereas Haskell has type erasure semantics, so where do I go to see the types? – AntC Sep 20 '18 at 05:48
1

I'm sorry my answer confused you. I figured that you might find an explanation in terms of type families helpful, given that those are usually more simply understood (the analogy is that type families are similar to functions on types). Unfortunately, data/type families don't have any data-level manifestation. They are purely type-level artifacts, meaning they pretty much disappear after type-checking. – Alec Sep 20 '18 at 07:42
"you might find an explanation in terms of type families helpful, given that those are usually more simply understood" @Alec, @AlexisKing type families are not injective (and Haskell's so-called injective families are only half-implemented). `F Int`, `F Bool` could easily be the same type. But `T Int`, `T Bool` cannot be, for type constructor `T`. Any data constructor within `instance DF [a]` must yield a 'result' whose type is apart from any data constructor within `instance DF (Maybe Int)`. – AntC Sep 20 '18 at 09:38
1

@AntC I am very aware that type families aren't injective, and that Haskell's injective type families are full of caveats. You can't have it both ways: either you ask for a correct answer (which will assume existing familiarity with other advanced Haskell) or you ask for a ground up explanation. I fear you are setting yourself up for disappointment by asking on SO. Also, note that a "type constructor" also includes type families. The non-overlapping property you point out is precisely injectivity of `DF`. – Alec Sep 20 '18 at 19:55
Then we'll agree to differ. "note that a "type constructor" also includes type families" I disagree; on the same grounds as I don't include functions as data constructors -- even though functions do construct values (which ipso facto have data constructors). In fact Haskell type families can't even be partially applied, they're a long way from being type constructors. – AntC Sep 21 '18 at 02:56
Yes, I think a "ground up explanation" would help me a lot more than a 'complexity down'. Or of course pointers to where I can find that. (I already read the GHC Users Guide and academic papers behind it, before asking the q.) In general by "asking on SO" what sort of 'explanation' should I expect if not in terms of simpler building-blocks? Plus 'more powerful ways to combine them', in the John Hughes sense. – AntC Sep 21 '18 at 06:47
I've now read this answer. This sentence is wrong, so I've downvoted: "Being a `newtype` is definitively a property of the type constructor, not of the data constructor." I've posted my understanding as an answer. I found no need to talk about type families/type functions. In fact I've quoted the literature saying these are "Function-free types". – AntC Sep 22 '18 at 02:26
1

@AntC being a `newtype` _is_ a property of being a type constructor - see [`isNewTyCon`](http://hackage.haskell.org/package/ghc-8.4.3/docs/TyCon.html#v:isNewTyCon). – Alec Sep 22 '18 at 02:34

AntC · Answer 2 · 2018-09-22T02:16:58.010

These are equivalent:

newtype          NT a b  = MkNT (Int, b)

data family DF a b
newtype instance DF a b  = MkDF (Int, b)

-- inferred MkNT :: (Int, b) -> NT a b
--          MkDF :: (Int, b) -> DF a b

Furthermore you can't declare any other instance/constructor within family DF, because their instance heads would overlap DF a b.

The error message in the q re standalone newtype Baz is misleading

-- Error: A newtype constructor must have a return type of form T a1 ... an

(and presumably dates from before there were data families). A newtype constructor must have a return type exactly as the instance head (modulo alpha renaming if it's using GADT syntax). For a standalone newtype, "instance head" means the newtype head.

For non-newtype data instances, the various constructors might have return types strictly more specific than the instance head (that makes them GADTs).

The type declared in a data/newtype instance head (in the literature variously called a 'type scheme', 'monotype' -- because no constraints, 'Function-free type' -- in the 2008 paper 'Type Checking with Open Type Functions') is the principal type that all the constructors' return types must unify with. (There might not be any constructors with exactly that return type.)

If your instance be a newtype, the rules are much tighter: there can be only one constructor, and its return type must be exactly the principal type. So to answer the original q

Does that mean that being newtype is a property of the data constructor rather than the datatype? ... but ... Then is being a newtype a property of the result type, rather than of the constructor?

No, not of the constructor; yes more like of the result: being a newtype is the property of a data family instance/its principal type. It's just that for standalone newtypes, there can in effect be only one instance, and its principal type is the most general type for that type constructor. Derivatively, we can say the newtype's principal type/return type uniquely identifies a data constructor. That's crucial for type safety, because values of that type share their representation with the type inside the newtype's data constructor, i.e. with no wrapper -- as @AlexisKing's comment points out. Then pattern matching doesn't need to go looking for the constructor: the match is irrefutable/the constructor is virtual.

It strikes me that in the interests of more precise typing/better documentation, you might want to declare a newtype as a data family with only one instance, whose head is more specific than what you must put for a standalone newtype.

data family instances: newtype, data

2 Answers2

Linked