Type systems: coercion, casts, and conversions
February 28th, 2012Programming language designers and users spend a lot of time talking about casts. The core idea of a cast is to convert between two types - either statically, or dynamically. Reading through a number of sources recently, I’ve been noticing that the term “cast” is massively overloaded. This blog post is an attempt to break down the various uses I’ve seen into their component parts.
Before jumping into the discussion of type casts, conversions, and coercions, let me remind you that every value has at least two types associated with it. The dynamic type is the actual type of the value at execution time. The static type(s) is(are) an approximation of that type available at compile time. Generally, the static type of a value must be accurate, but need not be precise. (i.e. It’s perfect legal and common to refer to a value by a base-type pointer.)
A type conversion is a programmatic way to convert a value from one type to another. Depending on the language involved, this may involving copying the contents, or applying arbitrary user defined conversion logic. The core part though is that the value is changing, not merely the type associated with that value.
A type cast is the replacement of one static type with another without changing the actual value. Generally, the type cast to is assumed to be accurate (if potentially less precise) representation of the actual type of the value. A type cast may be checked or unchecked depending on the semantics of the language. For checked casts, if a type cast fails the program executes defined error behavior such as aborting, or throwing an exception. For unchecked casts, if the type cast fails the program is left in an undefined state and no further guarantees are given.
A type coercion is the forceful reinterpretation of memory as a value of another type. Arguable, such coercions are also unchecked type casts, but the semantics are slightly different. A type coercion relies on the structure of the two types and not on their nominal relationship. To put it another way, a type coercion is expected to violate the type system. Some languages do provide minimal checking for type coercion, but generally, this is a use at your own (extreme) risk feature. If a value is converted to an incompatible type - whatever that might mean - behavior is generally ill-defined.
Each of the above can be either explicit or implicit. Explicit (casts, conversions, coercions) require explicit annotation from the programmer. They do not happen silently, and must appear directly in code. Implicit (casts, conversions, coercions) are inserted by the compiler based on the language semantics. C and C++ implicit numeric conversion are a well known example of the later. As a matter of personal opinion, I think implicit (cast, conversions, coercions) are a horrible mistake in any language design.
To put all of this terminology in the context of a language you may know, let’s consider the “casts” available in C++. C-style casts in C++ are generally type coercions, though they may act like type conversions if a cast operator is available. static_cast is a mixture of type casting and conversion; it will only convert between types which are known statically to be compatible or convertible via a user-defined cast operator. dynamic_cast is a checked type cast. It’s semantics are well defined if the dynamic type of the value doesn’t match the cast. const_cast is a restricted form of type coercion which only applies to type qualifiers (const, volatile). reinterpret_cast is a full power unchecked type coercion.
If you’re interested in type systems, you may find my earlier blog post interesting.