Saturday, May 25, 2013

Sets and the Type of 1

I've been pondering the subject of types and values for some time and decided to materialize my thoughts in a blog post. Too keep things simple and easily readable I will reason about types from a set theory point of view, not using traditional type theory.

What is a Type?

Let's start by investigating what a type is. According to Wikipedia a type system can be defined as "a tractable syntactic framework for classifying phrases according to the kinds of values they compute". So, the type of a phrase (whatever that is) is simply the set of the values that can be computed from this phrase. In this post I will use the terms type and set interchangeably as I consider them to be the same thing (mathematically this might not be correct).

The Type of 1

So, what is the type of the value 1? In Scala (and Java, C/C++, C# etc.) the type of 1 is a 32-bit signed integer. This is strange as the value 1 is a member of many sets (in fact an infinite number of them), so why is this set chosen? Probably because of historical reasons and because it's a convenient type to do calculations with on most computers, but mathematically this type doesn't have any special meaning.

In Haskell the type of 1 is (Num t) => t. Indeed confusing, the type is defined in terms of a typeclass. This set includes a lot of values, including user defined ones. Really, Haskell must resort to typeclasses to define the type of 1?

I would argue that there are only two reasonable types for the value 1: the set containing all values and the set containing only the value 1.  The set containing all values is really not useful in static type checking, so that leaves the set which only element is the value 1, {1} in set notation (also called a singleton set). The same reasoning would apply to all values. I find it remarkable that pretty much all commonly used programming languages gets this simple type wrong. How can one expect a type system to be useful if it can't even infer the correct type of the most basic expressions?

The Type of a Function

A (total) function is a relation that maps each element of one set, let's call it the input set, to exactly one element of another (or the same) set, let's call it the output set. This relation can be modelled as a set of pairs a -> b where a is a member of the input set and b is a member of the output set. The additional restriction on the relation set is that there must be exactly one pair a -> * in the set (where * represents any member of the output set).

So, let's apply this model to a simple function in Scala:

val foo = (b : Boolean) => if (b) 1 else "a"

What is the only reasonable type of foo? Well, the input set of foo is Boolean which is equal to {true, false}. The output set of foo is {1, "a"}. If the input value is true, foo will output the value 1 and if the input value is false, foo will output value "a". So we can write the type of foo as {true -> 1, false -> "a"}. In Scala the type of foo is Boolean => Any, which is a pretty useless type as Any is the set of all values (i.e. the type doesn't tell us anything about the value). In Haskell the corresponding definition won't even type check. Seriously, these are state of the art programming languages, is this the best we can do?

Subtyping

Well, what if we have:

val addOne : Int => Int
val x : {1} = 1
addOne(x)   // Type error?

No, this should not be a type error. The first thing to notice is that the set Int contains the value 1 and thus {1} is a subset of Int. This means that it's perfectly safe to apply the function addOne on the input set {1} as we're guaranteed that the relation set addOne contains a mapping for each element in the input set. In type theory one would say that {1} is a subtype of Int.


Final Words

I've only skimmed the surface of how to apply the concept of sets and values to programming languages. The concept can be extended to ADT's, pattern matching, type classes etc. I think there is a lot to gain from thinking about types this way because there is something fundamentally broken in the way types are implemented in most programming languages. Do you agree?

Monday, March 11, 2013

A More Efficient Option

Updated: 2013-03-12

Scala's Option type is a big improvement in type safety over Java's null checking and NullPointerException's. Unfortunately when wrapping a value in Some there is a quite big overhead: creation of one additional object containing a reference to the value. It would be more efficient if we could represent Some as the unboxed pointer value and encode None as the null value, but at the same time preserving the type safety as we get with the Option type (for example Kotlin uses this approach in it's builtin support for nullable types). Well, we can do exactly this with value classes introduced in Scala 2.10:

final case class Option[+T](value: Any = null) extends AnyVal with Product with Serializable {
  private def unsafeGet = value.asInstanceOf[T]
  def isEmpty: Boolean = value == null
  ...
}

The reason that the class parameter is of type Any and not T is that it's not allowed to create a value of type Nothing. We still want to be able to create a None value of type Option[Nothing] though so we delay the unsafe cast to T (using the unsafeGet method) until the value is actually required and we've checked that it's actually there using the isEmpty method.

The code is on GitHub if someone wants to try it out. It's pretty much a dropin replacement for the standard Scala Option type.

Memory Usage Comparisons

Below is a comparison in memory usage between the scala.Option type and the unboxed AnyVal option type when allocating 1M objects each containing one optional value:

scala.Some: 33 MB
scala.None: 19 MB
scala.Some : Any: 34 MB
scala.None : Any: 19 MB
AnyVal Some: 19 MB
AnyVal None: 19 MB
AnyVal Some : Any: 34 MB
AnyVal None : Any: 34 MB

As one would expect, memory usage for Some is almost halved and for None the memory usage is unchanged. The downside of using the AnyVal option is that it uses more memory when a None is upcasted to a super type because then the compiler will box the reference. I would assume this is a quite rare case though.