The Accumulator Generator: Java and Lisp

A couple of years back, Paul Graham created a stir with an entertaining and thought provoking essay called “Revenge of the Nerds”. In it, he put forward a simple bit of code called an Accumulator Generator to demonstrate the superiority of Lisp over other programming languages with special attention to Java. Taking this as something of a challenge, I decided to dig a little deeper. Perhaps, considering Java from the point of view of a Lisp hacker might help us heavy-browed cave-dwelling Java programming Neanderthals think outside of the object-oriented box, subclass of shape though it may be.

No Asbestos Suit Required

My purpose here is not at all to be a Java apologist or language flame-warrior. Rather, it is try to learn something by comparing and experimenting with different means of abstraction and different language mechanisms. The accumulator generator problem is small and easy to state, but lets you have a close look at some of the building blocks that various languages put at the disposal of the programmer. And the properties of your building blocks have interesting relationships with the kinds of things you can build. Besides, given the difference in pedigree between Java and Lisp, the similarities are, in some ways, more striking than the differences.

The Problem

The problem was to “write a function that generates accumulators-- a function that takes a number n, and returns a function that takes another number i and returns n incremented by i.” In Common Lisp, the solution looks like this:

(defun foo (n)
  (lambda (i) (incf n i)))

To clarify the syntax for us non-Lispers, it might help to consider a simple square function.

(defun square (x)
  (* x x))

This says, “define a function named square that takes one argument called x and returns the result of multiplying x by x”. The function can be applied like this.

(square 4)
16

Back to the accumulator generator. In a functional programming language like Lisp you can manipulate functions in interesting ways. The lambda function, for example, lets you create an anonymous function. The expression

(lambda (i) (incf n i))
translates to “create a function of one argument called i which increments the value of n by i and returns the result”. The function that gets created is the accumulator. The function foo is the accumulator generator. It generates an accumulator with the given initial value for n.

One of the points here is that the function foo is a function whose return value is another function. Another point relates to the variable n. Notice that to work properly the value of n must be preserved between invocations of the accumulator. Packaging together a function and a set of long-lived variables like this is called a closure.

Java Accumulator

The Java version given in the article is this homely snippet of code:

public interface Inttoint {
  public int call(int i);
}

public static Inttoint foo(final int n) {
  return new Inttoint() {
    int s = n;
    public int call(int i) {
      s = s + i;
      return s;
    }
  };
}

We're limited here to accumulating integers and, well, it doesn't look very nice, does it? Let's see if we can't do any better.

In the example above, the anonymous inner class is used to approximate Lisp's lambda and to allow the function foo to do something like returning a function. But trying to match Lisp in form as well as function is unnecessary. A more sensible Java implementation would be something like IntAccumulator. It delivers the same functionality as the code above while managing to look a lot more reasonable. One might object that IntAccumulator is an accumulator, not an accumulator generator. True enough. But, what is a constructor if not a built in generator? The new keyword generates instances of the accumulator nicely and the constructor sets the initial value. So, there's no need to muck about with the interface or the anonymous inner class at all. Notice, however, that we're still stuck with accumulating only integers.

Good Enough?

If practicality were the main consideration, IntAccumulator would almost certainly suffice for all your accumulating needs. Better still, thanks to type promotion, DoubleAccumulator will accumulate all primitive numeric data types respectably. Reasonable people would be content with that, but why stop at mere practicality? We're geeks after all. Let's instead see about this type limitation. Paul Graham says that in Java, “...writing a properly polymorphic version [...] is somewhere between damned awkward and impossible.”

There is a good reason to look further. A lot of the expressive power of a programming language comes from the tools it gives the programmer for building abstraction. Abstraction can be thought of as the ability to consolidate common elements, so you don't have to repeat yourself. In accordance with good software engineering practice, we'd like to be able to separate out the elements common to all accumulators from the variable elements like what type of thing is being accumulated.

In our first attempt (poly1), similar the example above, all we've managed to abstract out is the idea that an Accumulator is something that can be incremented by something else – not so brilliant. Next we'll try a slightly different tack (poly2). Let's create an interface for the thing being accumulated. This allows us to have a generic Accumulator, but now the pain is shifted to creating a bothersome wrapper around our Accumulable data types. Still not entirely satisfactory. Speaking of generic, this seems like a case where the spiffy new Java 1.5 (aka 5.0) Generics should help. So, let's try to make a GenericAccumulator. At least we're able to be specific about what a given accumulator can accumulate. Note that we can't do without the Accumulable interface (like this) even in the generic code. Things are indeed starting to look “damned awkward”. What's holding us back here?

The Burden of Proof

The main culprit is static typing. Imagine trying to increment April 15th by 3 Euros. Adding a currency to a date is nonsense, unless you literally believe that time is money. In any programming language, we need to give the correct types of arguments whenever we call a function, otherwise an error occurs. Statically typed languages impose an additional burden. The types of arguments we give to an operation must be provably correct at compile time. Now we've come upon a surprising fact. Static typing prevents us from generalizing unless its strictly true that we have a common operation, as in the Accumulable interface. This explains why generics don't help much, since Java Generics are (mostly) just a mechanism for enforcing type safety on collections of objects. So, it seems we've discovered a downside to static typing.

Dynamic vs. Static

Note that the Lisp doesn't have a magic solution to the issue of what operations apply to what types. Dynamically typed languages merely defer the issue until runtime. Lisp does exactly what you'd expect when confronted with arguments that it can't make sense of.

(defun foo (n) (lambda (i) (incf n i)))
FOO
(setq a (foo 100))
#< COMPILED-FUNCTION: #xCFA7C0 >
(funcall a 1)
101
(funcall a 33.3333)
134.333296
(setq b (foo "pizza"))
#< COMPILED-FUNCTION: #xCFA7C0 >
(funcall b "beer")
;;; An error occurred in function +:
;;; Error: Cannot call function '+' with these operands: pizza and beer

Like Lisp, Python is a dynamically typed language. Python accumulates various data types admirably, but still has to object when asked to increment a string by an integer.

>>> class foo:
...   def __init__(self, n):
...     self.n = n
...   def __call__(self, i):
...     self.n += i
...     return self.n
...
>>> a = foo(100)
>>> a(1)
101
>>> a(33.3333)
134.33330000000001
>>> b = foo("apple")
>>> b("pie")
'applepie'

>>> b(34)
Traceback (most recent call last):
File "", line 1, in ?
File "", line 5, in __call__
TypeError: cannot concatenate 'str' and 'int' objects

Dynamically typed languages trust the programmer to call functions with the right types of arguments. They also allow you to code against implementations that don't exist yet. For example, we may later extend the incf function in the Lisp example to accumulate pizza and beer. In Java you need to explicitely write an interface to allow for this kind of extensibility, whereas dynamically typed languages just leave some options open by default. This ability to write “speculative” code is an important point in favor of dynamically typed languages.

Static typing, on the other hand, offers safety – protection from run-time errors. It may be preferable to accept the confines of type safety in exchange for stronger compile-time error checking. It's interesting to note that some in the Java community, like Bruce Eckel, have questioned the benefit of static typing in light of pervasive unit testing. A quick search will demonstrate that type theory is an active area of research in computer science. Strong static type checking is sometimes linked to formal methods, which makes sense considering that a type system is “fundamentally a verification tool that suffices to ensure invariants on execution behavior.” according to CMU Professor Robert Harper. (see his talks The Practice of Type Theory in Programming Languages and Programming Languages: The Essence of Computer Science)

Method Dispatch

A closely related issue is method dispatch. Another way to look at the problem is that we must be able to find an implementation of the increment operation that applies to the types of arguments we are given. This can happen either at compile time or at run time, or even a little of both. In the OO world, our primary weapons for method dispatch are overloading and polymorphism. Polymorphism works great for varying the implementation of a method with a common signature based on one parameter: the implicit this parameter. Overloading works for multiple parameters, but only at compile time.

Together, polymorphism and overloading are usually sufficient, but they have their limitations. For example, look again at the polymorphic Accumulator. The method incrementBy(Object obj) promises more that it can possibly deliver. You can't really expect it to do a reasonable job of incrementing by any known or unknown type of object. That's an open ended problem. Polymorphism works great if the type of one argument is varying. But here we are potentially varying the types of both the increment and the value. Overloading would require us to define the types we're willing to operate on in advance -- not what you want in an extensible base class. You can see another example of the same problem being delt with by looking at the proliferation of print methods on java.io.PrintWriter. Notice that all of PrintWriter's numerous methods are implemented in terms of the much more compact Writer class.

In general, the problem of dynamically dispatching calls given an operation and an arbitrary set of parameters is not easy to solve in an efficient and extensible way. Greater generality has a cost in terms of efficiency and complexity. There's a lot more to be said on the subject of method dispatch, but I'll just mention a couple other mechanisms which are lacking in the Java world. One is quite well known - operator overloading. The other, multiple dispatch, is more esoteric. The discussion of this topic in Structure and Interpretation of Computer Programs (where they use the term generic operations) is quite good.

Philosophy 101

At the root, the most important issue here is static typing vs. dynamic typing. The fuss about a function returning a function is something of a red herring, as is the closure. Java has perfectly workable alternatives to these Lisp mechanisms. Java is missing some of the more advanced mechanisms for controlling method dispatch, but creating a dispatch mechanism that works simply, efficiently, and with maximal generality is a hard problem regardless. Essentially all the extra code in the Java version is there to make the type system happy.

The main reason we're running into problems in the Java case is that we're fighting the language. The language is not meant to provide wide latitude with types, rather, its meant to be restrictive. Restriction can be a good thing in the interest of preventing runtime errors or in the contexts of security and defensive programming. It depends on how much you trust yourself, and more importantly, how much you trust the other guy. Part of the philosophy behind Java is a consistent preference for security and robustness over other considerations. Perhaps it's not surprising that, to a Lisp hacker, this restrictiveness feels like a straight-jacket or training wheels.

So, we've worked our way down to philosophy. I've often found that in order to understand how a particular piece of technology works, it's indispensable to know something about the philosophy that informed its designers. On both the Lisp and Java side, this seems to be another case in point.


written in 2004 and 2005 by J. Christopher Bare
christopherbare at cbare dot org