+++
title = "`data`"
weight = 1
+++

A `data` definition defines a brand new type, which is different from
every primitive type and every other type defined using a `data`
definition, even if they look structurally similar. The new type defined
by a `data` definition is a "sum of products", or a "union of products".

```
topDefn  ::= data typeId {tyVarId } = {summand | }[ derive ]
summand  ::= conId {type }
summand  ::= conId { { fieldDef ; }}
derive   ::= deriving ( { classId , })
fieldDef ::= fieldId :: type
```

The *typeId* is the name of this new type. If the *tyVarId*'s exist,
they are type parameters, thereby making this new type polymorphic. In
each *summand*, the *conId* is called a "constructor". You can think of
them as unique *tag*'s that identify each summand. Each *conId* is
followed by a specification for the fields involved in that summand
(*i.e.,* the fields are the "product" within the summand). In the first
way of specifying a summand, the fields are just identified by position,
hence we only specify the types of the fields. In the second way of
specifying a summand, the fields are named, hence we specify the field
names (*fieldId*'s) and their types.


The same constructor name may occur in more than one type. The same
field name can occur in more than one type. The same field name can
occur in more than one summand within the same type, but the type of the
field must be the same in each summand.


The optional *derive* clause is used as a shorthand to make this new
type an instance of the *classId*'s, instead of using a separate,
full-blown `instance` declaration. This can only be done for certain
predefined *classId*'s: `Bits`, `Eq`, and `Bounded`. The compiler
automatically derives the operations corresponding to those classes
(such as `pack` and `unpack` for the `Bits` class). Type classes,
instances, and `deriving` are described in more detail in sections
[2.1](fixme), [4.5](fixme) and [
4.6](fixme).

To construct a value corresponding to some `data` definition $T$, one
simply applies the constructor to the appropriate number of arguments
(see section [5.3](fixme){reference-type="ref"
reference="sec-exprs-constrs"}); the values of those arguments become
the components/fields of the data structure.


To extract a component/field from such a value, one uses pattern
matching (see section [6](fixme){reference-type="ref"
reference="sec-patterns"}).


Example:

```hs
data Bool = False | True
```


This is a "trivial" case of a `data` definition. The type is not
polymorphic (no type parameters); there are two summands with
constructors `False` and `True`, and neither constructor has any fields.
It is a 2-way sum of empty products. A value of type `Bool` is either
the value `False` or the value `True` Definitions like these correspond
to an "enum" definition in C.


Example:

```hs
data Operand = Register (Bit 5)
             | Literal (Bit 22)
             | Indexed (Bit 5) (Bit 5)
```


Here, the first two summands have one field each; the third has two
fields. The fields are positional (no field names). The field of a
`Register` value must have type Bit 5. A value of type `Operand` is
either a `Register` containing a 5-bit value, or a `Literal` containing
a 22-bit value, or an `Indexed` containing two 5-bit values.


Example:

```hs
data Maybe a = Nothing | Just a
               deriving (Eq, Bits)
```

This is a very useful and commonly used type. Consider a function that,
given a key, looks up a table and returns some value associated with
that key. Such a function can return either `Nothing`, if the table does
not contain an entry for the given key, of `Just `$v$, if the table
contains $v$ associated with the key. The type is polymorphic (type
parameter "`a`") because it may be used with lookup functions for
integer tables, string tables, IP address tables, etc., *i.e.,* we do
not want here to over-specify the type of the value $v$ at which it may
be used.


Example:

```hs
data Instruction = Immediate { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; }
                 | Jump { op::Op; target::UInt26; }
```


An `Instruction` is either an `Immediate` or a `Jump`. In the former
case, it contains a field called `op` containing a value of type `Op`, a
field called `rs` containing a value of type `Reg`, a field called `rt`
containing a value of type `CPUReg`, and a field called `imm` containing
a value of type `UInt16`. In the latter case, it contains a field called
`op` containing a value of type `Op`, and a field called `target`
containing a value of type `UInt26`.

> **NOTE:**
>
> Error messages involving data type definitions sometimes show traces of
> how they are handled internally. Data type definitions are translated
> into a data type where each constructor has exactly one argument. The
> types above translate to:
>
> ```hs
>  data Bool = False PrimUnit | True PrimUnit
>
>  data Operand = Register (Bit 5)
>               | Literal (Bit 22)
>               | Indexed Operand_$Indexed
>  struct Operand_$Indexed = { _1 :: Reg 5; _2 :: Reg 5 }
>
>  data Maybe a = Nothing PrimUnit | Just a
>
>  data Instruction = Immediate Instruction_$Immediate
>                   | Register Instruction_$Register
>
>  struct Instruction_$Immediate = { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; }
>  struct Instruction_$Register = { op::Op; target::UInt26; }
> ```