+++ title = "`data`" weight = 1 +++ A `data` definition defines a brand new type, which is different from every primitive type and every other type defined using a `data` definition, even if they look structurally similar. The new type defined by a `data` definition is a "sum of products", or a "union of products". ``` topDefn ::= data typeId {tyVarId } = {summand | }[ derive ] summand ::= conId {type } summand ::= conId { { fieldDef ; }} derive ::= deriving ( { classId , }) fieldDef ::= fieldId :: type ``` The *typeId* is the name of this new type. If the *tyVarId*'s exist, they are type parameters, thereby making this new type polymorphic. In each *summand*, the *conId* is called a "constructor". You can think of them as unique *tag*'s that identify each summand. Each *conId* is followed by a specification for the fields involved in that summand (*i.e.,* the fields are the "product" within the summand). In the first way of specifying a summand, the fields are just identified by position, hence we only specify the types of the fields. In the second way of specifying a summand, the fields are named, hence we specify the field names (*fieldId*'s) and their types. The same constructor name may occur in more than one type. The same field name can occur in more than one type. The same field name can occur in more than one summand within the same type, but the type of the field must be the same in each summand. The optional *derive* clause is used as a shorthand to make this new type an instance of the *classId*'s, instead of using a separate, full-blown `instance` declaration. This can only be done for certain predefined *classId*'s: `Bits`, `Eq`, and `Bounded`. The compiler automatically derives the operations corresponding to those classes (such as `pack` and `unpack` for the `Bits` class). Type classes, instances, and `deriving` are described in more detail in sections [2.1](fixme), [4.5](fixme) and [ 4.6](fixme). To construct a value corresponding to some `data` definition $T$, one simply applies the constructor to the appropriate number of arguments (see section [5.3](fixme){reference-type="ref" reference="sec-exprs-constrs"}); the values of those arguments become the components/fields of the data structure. To extract a component/field from such a value, one uses pattern matching (see section [6](fixme){reference-type="ref" reference="sec-patterns"}). Example: ```hs data Bool = False | True ``` This is a "trivial" case of a `data` definition. The type is not polymorphic (no type parameters); there are two summands with constructors `False` and `True`, and neither constructor has any fields. It is a 2-way sum of empty products. A value of type `Bool` is either the value `False` or the value `True` Definitions like these correspond to an "enum" definition in C. Example: ```hs data Operand = Register (Bit 5) | Literal (Bit 22) | Indexed (Bit 5) (Bit 5) ``` Here, the first two summands have one field each; the third has two fields. The fields are positional (no field names). The field of a `Register` value must have type Bit 5. A value of type `Operand` is either a `Register` containing a 5-bit value, or a `Literal` containing a 22-bit value, or an `Indexed` containing two 5-bit values. Example: ```hs data Maybe a = Nothing | Just a deriving (Eq, Bits) ``` This is a very useful and commonly used type. Consider a function that, given a key, looks up a table and returns some value associated with that key. Such a function can return either `Nothing`, if the table does not contain an entry for the given key, of `Just `$v$, if the table contains $v$ associated with the key. The type is polymorphic (type parameter "`a`") because it may be used with lookup functions for integer tables, string tables, IP address tables, etc., *i.e.,* we do not want here to over-specify the type of the value $v$ at which it may be used. Example: ```hs data Instruction = Immediate { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; } | Jump { op::Op; target::UInt26; } ``` An `Instruction` is either an `Immediate` or a `Jump`. In the former case, it contains a field called `op` containing a value of type `Op`, a field called `rs` containing a value of type `Reg`, a field called `rt` containing a value of type `CPUReg`, and a field called `imm` containing a value of type `UInt16`. In the latter case, it contains a field called `op` containing a value of type `Op`, and a field called `target` containing a value of type `UInt26`. > **NOTE:** > > Error messages involving data type definitions sometimes show traces of > how they are handled internally. Data type definitions are translated > into a data type where each constructor has exactly one argument. The > types above translate to: > > ```hs > data Bool = False PrimUnit | True PrimUnit > > data Operand = Register (Bit 5) > | Literal (Bit 22) > | Indexed Operand_$Indexed > struct Operand_$Indexed = { _1 :: Reg 5; _2 :: Reg 5 } > > data Maybe a = Nothing PrimUnit | Just a > > data Instruction = Immediate Instruction_$Immediate > | Register Instruction_$Register > > struct Instruction_$Immediate = { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; } > struct Instruction_$Register = { op::Op; target::UInt26; } > ```