I am trying to design an API for some database system in Haskell, and I would like to model the columns of this database in a way such that interactions between columns of different tables cannot get mixed up.
More precisely, imagine that you have a type to represent a table in a database, associated to some type:
type Table a = ...
and that you can extract the columns of the table, along with the type of the column:
type Column col = ...
Finally, there are various extractors. For example, if your table contains descriptions of frogs, a function would let you extract the column containing the weight of the frog:
extractCol :: Table Frog -> Column Weight
Here is the question: I would like to distinguish the origin of the columns so that users cannot do operations between tables. For example:
bullfrogTable = undefined :: Table Frog
toadTable = undefined :: Table Frog
bullfrogWeights = extractCol bullfrogTable
toadWeights = extractCol toadTable
-- Or some other columns from the toad table
toadWeights' = extractCol toadTable
-- This should compile
addWeights toadWeights' toadWeights
-- This should trigger a type error
addWeights bullfrogWeights toadWeights
I know how to achieve this in Scala (using path-dependent types, see [1]), and I have been thinking of 3 options in Haskell:
not using types, and just doing a check at runtime (the current solution)
the TypeInType extension to add a phantom type on the Table type itself, and pass this extra type to the columns. I am not keen on it, because the construction of such a type would be very complicated (tables are generated through complex DAG operations) and probably slow to compile in this context.
wrapping the operations using a
forall
construct similar to the ST monad, but in my case, I would like the extra tagging type to actually escape the construction.
I am happy to have a very limited valid scoping for the construction of the same columns (i.e. columns from table
and (id table)
not being mixable), and I mostly care about the DSL feel of the API rather than the safety.
[1] What is meant by Scala's path-dependent types?
My current solution
Here is what I ended up doing, using RankNTypes.
I still want to give users the ability to use columns how they see fit without having some strong type checks, and opt in if they want some stronger type guarantees: this is a DSL for data scientist who will not know the power of the Haskell side
Tables are still tagged by their content:
type Table a = ...
and columns are now tagged with some extra reference types, on top of the type of the data they contain:
type Column ref col = ...
Projections from tables to columns are either tagged or untagged. In practice, this is hidden behind a lens-like DSL.
extractCol :: Table Frog -> Column Frog Weight
data TaggedTable ref a = TaggedTable { _ttTable :: Table a }
extractColTagged :: Table ref Frog -> Column ref Weight
withTag :: Table a -> (forall ref. TaggedTable ref a -> b) -> b
withTag tb f = f (TaggedTable tb)
Now I can write some code as following:
let doubleToadWeights = withTag toadTable $ \ttoadTable ->
let toadWeights = extractColTagged ttoadTable in
addWeights toadWeights toadWeights
and this will not compile, as desired:
let doubleToadWeights =
toadTable `withTag` \ttoads ->
bullfrogTable `withTag` \tbullfrogs ->
let toadWeights = extractColTagged ttoads
bullfrogWeights = extractColTagged tbullfrogs
in addWeights toadWeights bullfrogWeights -- Type error
From a DSL perspective, I believe it is not as straightforward as what one could achieve with Scala, but the type error message is understandable, which is paramount for me.