OO is not the One True Paradigm, but Haskell still sucks at it
I just read osfameron’s Readable Perl talk. It’s a pretty typical language advocation talk, nothing special, but it reminded me of Perl. Those who have not been reading my blog since 2005 may not know that I used to be a Perl nut. I was even on the Perl 6 design team, attended several design mettings with Larry Wall (and a couple other notable geniuses), had the Perl 6 syntax memorized, …. Quite insane I was.
I have been pondering recently about the cognitive mismatch between OO and functional approaches. The two have been fused in languages, see O’Caml, but I argue that code in such a fused language will mostly end up looking like one or the other, not a beautiful balance as we would like.
My thesis is that the two support different models of the universe. The functional approach supports a “mathematical” view, where we know a lot about the structure of our data; the object approach supports an “empirical” view, where we know a lot about the data itself. Let’s use something I was playing with today as an example: the SK calculus.
The obvious functional approach is to define a data type representing the cases of the AST:
data AST = S | K | AST :@ AST
(The :@ is the name of an infix constructor; i.e. it is just a node with two subtrees)
Then to implement a function that, say, reduces the top level of a tree, we can analyze the data by pattern matching:
reduce (S :@ x :@ y :@ z) = Just $ x :@ z :@ (y :@ z) reduce (K :@ x :@ y) = Just $ x reduce _ = Nothing
Here we know a lot about what kind of argument reduce will get. Whatever it gets, we know that it will be either an S, a K, or an application. We then define what it means to reduce data of this form.
Now I could half-bake a standard OO retort showing how much incredibly better functional programming is by how awkward this would be in a typical OO language (and it would be). But that’s trying to apply the functional world-view, that we know a lot about the structure of our data, to an OO setting. I think a good OO design for this same problem would take quite a different form. I see it something like this:
interface Combinator {
int arguments();
AST apply(AST[] args);
}
class S implements Combinator {
override int arguments() { return 3; }
override AST apply(AST[] args) {
return new Apply(new Apply(args[0], args[2]), new Apply(args[1], args[2]);
}
}
class K implements Combinator {
override int arguments() { return 2; }
override AST apply(AST[] args) {
return args[0];
}
}
...
While I gave a working Haskell program above in four lines, my “good” OO solution (in whatever language this is… looks like C#) has much more than that and is not even complete. I have left out the definition of Apply and the function which actually does the reduction. But I’m not bashing OO here (but please do understand if I bash OO syntax, which as the years go by seems to get more and more verbose). Instead it’s just that this problem is very well-suited to functional programming.
But this program has very different properties from the Haskell version. In particular, it is easy to add a new combinator, a new object, that does something different. Whereas in the Haskell program, adding a new primitive combinator would change the assumptions of every function that worked with combinators. Conversely, adding data manipulation functions which depend on particulars, namely whether something is an S or a K (or whatever else), involves touching every object to add that method. Whereas in Haskell, we can just add a new function. Astute readers will recognize this as the famous “expression problem”.
This trade-off starts to affect the way we proceed. If we were to implement identical functionality in the two programs, our approaches will diverge greatly. For example, today I added a function to invert an application. In Haskell, I just enumerated by cases: this is how you invert an S-reduction, this is how you invert a K-reduction, etc.
In the OO program I wouldn’t add a visitor though, that would be stupid. Instead I would create a new node type for variables, apply the transformation to a number of variables equal to the number of expected arguments, and analyze the positions of the variables in the result. I would end up with a function that can invert any combinator. That is, the natural next step in the OO example is to write more generic code than in the functional example.
Anyway, there’s what I consider a nice side-by-side comparison of the two approaches. Maybe by analyzing these two examples and where they led, you can start to see what I’m saying about the two cognitive models.
And this brings me back to Perl. The slides I read mentioned Moose, an object model module for Perl. It is rich: supports inheritance, role composition, privacy, coersion, and many other tools. I think an OO system needs to have these tools: the OO world view counts on data having many facets, capabilities, constituents, and concerns itself with what is possible when you know about certain ones. An OO programmer must be an expert at creating data with many facets, capabilities, and constituents. This is contrasted to functional programming where everything is known about the data, and the focus is on manipulating it.
Haskell has no support for these tools. Its type system, although powerful, is too static. Haskell provides too many guarantees for OO to work; it wants to know too much about what’s going on. Similarly I don’t consider C++, Java, C# to be “powerful” OO languages. In fact, I might go so far as to say a language with static types and OO is not properly embracing the philosophy. You’re creating rich, smart data. If your superclass guaranteed that a particular method was const, i.e. does not change the object it was called on, how are you supposed to smartly cache the results of that function? Well, with the mutable hack. But that’s a hack, essentially telling the type system that it should go take a hike because you know what you’re doing.
Perl and Smalltalk “get it”, in my opinion. They have simple models, but provide tools for looking inside and changing data on the fly. They are about creating and manipulating smart data, leaving the guarantees up to the naming of the methods and their intended use, because trying to force guarantees would restrict their ability to be smart. If you want guarantees, use Haskell; it will almost prove your code is correct for you. But Haskell has no smart data, that’s the only way it can provide guarantees. You can’t have case analysis of data structures and objects that can reverse-engineer your function at the same time.
That’s it: when you’re in OO, don’t design functionally. Keep your guarantees wide open, and focus on making smart objects.

is straightforward
.
is straightforward
.
is also
has an obvious solution, same as
. There is no unique largest solution, but there is a unique largest for each x which can be found in
time. It’s unclear what the library should do in this scenario.
is called the
. It turns out that there is a unique largest solution for this, and here is an algorithm that finds it:
.
. That is, iteratively remove x’s from r that don’t have corresponding y’s. Then define the result
.
such that
.
.
Thus there exists
such that P(x,y), by the definition of
.
and
, then
.
. There is an
with
. The only way that could have happened is if there were no y in rn-1 with P(x,y). But there is a y in R’ with P(x,y), so
, contradicting n’s minimality.
, maybe better.
. Each existential before the universal adds an order of magnitude, and I think each one after does too, but I haven’t thought it through.
time (16,807 checks for a 7-card hand, yikes), when my hand-written algorithm could do it in