7 - Types#

Primitive Types#

Almost all programming languages provide a set of primitive data types, which are data types not defined in terms of other types. Some primitive data types are reflections of the hardware, while others require only a little non-hardware support for their implementation.

Integers#

Integers are almost always an exact reflection of the hardware, so the mapping is trivial. Languages will often have several integer types. Java has 4 sizes for signed integers: byte, short, int, and long.

Floating Points#

Languages for scientific use support at least 2 floating-point types (e.g. float and double). Floating points are usually designed exactly as the hardware, but not always. See IEEE Floating-Point Standard 754 for more information.

Complex#

Some languages support a complex data type (e.g. C99, Fortran, Python). Each value consists of two floats, representing the real and imaginary parts. Python depicts this type as follows: (7 + 3i).

Decimal#

The decimal data type is built for business applications (e.g. COBOL, C#). A decimal data type stores a fixed number of decimal digits in coded form (BCD). The advantage of this is accuracy, and the disadvantage is wasted memory and limited range.

Boolean#

Booleans, despite only being 1 bit, are often represented as a byte.

Character#

Characters are stored as numeric codings, with the most commonly used coding being ASCII. Another common codding is 16-bit Unicode (UCS-2), which includes characters from most natural languages and was originally used in Java, but is now supported by many other languages. There is also 32-bit Unicode (UCS-4 or UTF-32), which was originally supported by Fortran in 2003.

Character String Types#

Typical Operatrions#

  • Assignment/copying

  • Comparison (==, >, etc.)

  • Catenation

  • Substring reference

  • Pattern matching

Character String Type in Certain Languages#

In C and C++, strings are not primitive and use char arrays and a library of functions that provide string operations.

In SNOBOL4 (a string manipulation language), strings are primitive and support many operations, including elaborate pattern matching.

In Fortran and Python, strings are a primitive type with assignment and several operations.

In Java, C#, Ruby, and Swift, strings are primitive via the String class.

Perl, JavaScript, Ruby, and PHP provide built-in pattern matching using regular expressions (regex).

Character String Length Operations#

Static length is used in COBOL and Java’s String class.

Limited Dynamic Length is used in C and C++. In these languages, a special character is used to indicate the end of a string’s characters, rather than maintaining the length.

Dynamic Length is used in SNOBOL4, Perl, and JavaScript.

Enumeration Types#

All possible values, which are named constants, are provided in the definition of enumeration types.

For example, in C#:

enum days {mon, tue, wed, thu, fri, sat, sun};

Some design issues to consider with enums:

  • Is an enumeration type constant allowed to appear in more than one type definition, and if so, how is the type of an occurence of that constant checked?

  • Are enumeration types coerced to integer?

  • Are any other type coerced to an enumeration type?

Evaluation of Enumeration Types#

Enums are an aid to readability, since you don’t have to code options as a number.

They are also an aid to reliability, since the compiler can check:

  • Operations (e.g. don’t allow enums to be added to create nonsense values)

  • No enumeration variable can be assigned a value outside its defined range.

  • C#, F#, Swift, and Java 5.0 provide better support for enumeration than C++ because enumeration type variables in these languages are not coerced into integer types.

Array Types#

An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element.

Array Design Issues#

  • What types are legal for subscripts?

  • Are subscripting expressions in element refernces range checked?

  • When are subscript ranges bound?

  • When does allocation take place?

  • Are ragged and rectangular multidimensional arrays allowed?

  • What is the maximum number of subscripts?

  • Can array objects be initialized?

  • Are any kind of slices supported?

Array Indexing#

Indexing (or subscripting) is a mapping from indices to elements. Most languages used brackets for indexing, but Fortran and Ada uses parentheses. Languages use integers as the index type.

C, C++, Perl, and Fortran do not specify range checking. However, Java, ML, and C# specify range checking.

Subscript Binding and Array Categories#

Static subscript ranges are statically bound and storage allocation is static (before run-time). The advantage of this is efficiency.

Fixed stack-dynamic subscript ranges are statically bound, but the allocation is done at declaration elaboration time. The advantage of this is space efficiency.

Fixed heap-dynamic is similar to fixed stack-dynamic. Storage binding is dynamic but fixed after allocation (i.e. binding is done when requested and storage is allocated from heap, not stack).

Heap-dynamic binding of supscript ranges and storage allocation is dynamic and can change any number of times. The advantage is flexibility (arrays can grow and or shrink during program execution).

C and C++ arrays that include the static modifier are static, while those without the modifier are fixed stack-dynamic. C and C++ also provide fixed heap-dynamic arrays.

Perl, JavaScript, Python, and Ruby support heap-dynamic arrays.

Array Initialization#

Some languages allow initialization at the time of storage allocation.

  • C, C++, Java, Swift, and C# (C# example)

int list[] = [4, 5, 7, 83];
  • Character strings in C and C++

char name[] = "freddie";
  • Array of strings in C and C++:

char *names[] = ["Bob", "Jake", "Joe"];
  • Java initialization of String objects

String[] names = ["Bob", "Jake", "Joe"];

Rectangular and Jagged Arrays#

A rectangular array is a multi-dimensional array in which all of the rows have the same number of elements and all of the columns have the same number of elements.

A jagged matrix has rows with varying number of elements, which is possible when multi-dimensional arrays actually appear as arrays of arrays.

Slices#

A slice is some substructure of an array. Slices are only useful in arrays that have array operations.

Python example:

vector = [2, 4, 6, 8, 10, 12, 14, 16]
mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
  • vector[3:6] is a three-element array

  • mat[0][0:2] is the first and second element of the first row of mat

Ruby supports slices with the slice method: list.slice(2,2) returns the third and fourth elements of list.

Implementation of Arrays#

An access function maps subscript expressions to an address in the array.

In a single-dimensional array:

address(list[k]) = address(list[lower_bound]) + ((k-lower_bound) * element_size)
address(list[k]) = address(list[0]) + k*element_size

Accessing Multi-Dimensional Arrays#

There are 2 common methods for accessing two-dimensional arrays:

  • Row major order (by rows), which is used in most languages

  • Column major order (by columns), which is used in Fortran

Associative Arrays#

An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. Associative arrays, also known as dictionaries, are a built-in type in Perl, Python, Ruby, and Swift.

Associative Arrays in Perl#

Names begin with %, literals are delimited by parantheses:

%hi_temps = ("Mon" => 77, "Tue" => 79, "Wed" => 65, ...);

Subscripting is done by using brackets and braces:

$hi_temps{"Wed"} = 83;

Elements can be removed with delete:

delete $hi_temps{"Tue"};