As others have said, it's more like a discriminated union - which is an obscure construct that only C / C++ programmers are likely to have heard of.
You can kind of simulate this in an OO language by having an abstract base class for Haskell's "type", with a concrete subclass for each of Haskell's "constructor". In particular, your code fragment says that every NewData
object has four fields; this is incorrect. You can do something like this:
data Stuff = Small Int | Big String Double Bool
Now if I write Small 5
, this is a Stuff
value with only 1 field inside it. (It takes up that amount of RAM.) But if I do Big "Foo" 7.3 True
, this is also a value of type Stuff
, but it contains 3 fields (and takes up that much RAM).
Notice that the constructor name itself is part of the data. That's why you can do something like
data Colour = Red | Green | Blue
Now there are three constructors, each with zero fields. The constructor itself is the data. Now, C# lets you do
enum Colour {Red, Green, Blue}
But that's really just saying
Colour = int;
const int Red = 0;
const int Green = 1;
const int Blue = 2;
Note, in particular, you may say
Colour temp = 52;
By contrast, in Haskell a variable of type Colour
can only contain Red
, Green
or Blue
, and these are not in any way integers. You can define a function to convert them to integers if you like, but that's not how the compiler stores them.
Your comment about getters and setters illustrates the pitfalls of this approach; in Haskell, we don't usually worry about getters and setters. Simply defining a type is sufficient to create values of that type and to access their contents. It's sort of vaguely like a C# struct
with all fields marked public readonly
. (When we do worry about getters, we usually call them "projection functions"...)
In OO, you use classes for encapsulation. In Haskell, you do this with modules. Inside a module, everything has access to everything (much like a class can access every part of itself). You use an export list to say what parts of the module are public to the outside world. In particular, you can make a type name public, while completely hiding its internal structure. Then the only way to create or manipulate values of that type are the functions you expose from the module.
You asked about newtype
?
OK, the newtype
keyword defines a new type name which is actually identical to an old type, but the type checker thinks it's something new and different. For example, an Int
is just a normal number. But if I do
newtype UserID = ID Int
now UserID
is a brand new type, completely unrelated to anything. But under the covers, it's really just another name for good old Int
. What this means is that you can't use UserID
where an Int
is required - and you can't use Int
where a UserID
is required. So you can't mix up a user ID with some other random number just because they're both integers.
You can do exactly the same thing with data
:
data UserID = ID Int
However, now we have a useless UserID
structure that just contains a pointer to an integer. If we use newtype
then a UserID
is an integer, not a structure pointing to an integer. From the programmer's point of view, both definitions are equivilent; but under the hood, newtype
is more efficient.
(Minor nit-pick: Actually to make then identical you need to say
data UserID = ID !Int
which means that the integer field is "strict". Don't worry about this yet.)