rbandrews

There was never a time when I was suddenly not sick, but I think I am now mostly not sick. It happened gradually. Each day it was less and less effort to talk until now I am pretty much better. I expected that one day I would cough up something nasty, if I could only cough hard enough, and be instantly better. Didn't work that way.

Anyway, programming. I was thinking the other day about college, and Java. In college, I was a diehard Java weenie. After C++ for five (or six, I forget) years, anything would be good, and Java was designed specifically to ease the pain of C++: no memory management, huge API full of the stuff C++ always ignored (like graphics, and networking, and threads), and although I missed some things (pointers and the handy tricks you can do with them, passing by value, and global functions) I was by then fully brainwashed into the OO cult, and I figured that that was probably better. Everything should be an object! You can decompose your program into hundreds of little reusable programlets!

But this isn't what I was thinking about, this is just background. I was thinking about a class I took called Comparative Languages. Comparative Languages was an optional course, designed to come in your junior year. The alternate option was Numerical Methods, which scared away a lot of people because it involves a lot of mathematics. Comparative languages is a course on programming languages, different kinds of them, sort of a tour. There's also a good portion of it on programming language concepts, like variables, control structures, grammars, and scoping.

To put this in context, Virginia Tech starts all CS majors off with a course in C++ (or Java, now), followed by another, followed by another (which also focuses on OO design theory), followed by one on data structures and another on operating systems, which both use C++ heavily even if they don't teach you more of it. After that you get into theory courses and technical electives, but your first two years, C++ and computer science are pretty much synonymous.

Java fits in there somewhere now, but I'm not clear on it since it postdates my time there, so I'll just pretend it doesn't. The two languages are close enough to treat as one for purposes of this anyway.

So we have a roomful of students who have been taught that C++ is (implicitly) the only language worth knowing, and have studied it intently (or at least enough for a C+) for two years. If they've read anything outside school, like Usenet or job ads, they've heard that any language outside the C++/C#/Java family tree is useless ivory tower tree-hugging hippie crap. They know how to program. It's easy. You decide what objects your problem decomposes into, draw diagrams of it, and then write those classes. Done.

Of course, it's easy to fall into that trap when you never have to maintain code past turning it in.

So these students go in and they learn Prolog, and Scheme, and a little bit of theory. The theory in particular throws them for a loop, or at least it did my class. See, a big thing that you're taught in Comparative Languages is something called "lexical scoping", which I'm sure I've harped about before.

It goes like this. You have a function, called "f". Inside it is declared a variable "x", and also another function "g" that references "x". Looks like this:

function f:
   var x
   function g:
      print(x)
   end
end

And they go on to explain how you can pass "g" around as a parameter or return it, and this is called the "funarg problem", and how dynamic scoping works but isn't too useful, and how lexical scoping works and thunks and closures and everything. And everyone is completely lost because you can't do this in C++. You can't declare a function inside a function, there are no closures, no lexical scoping, neither the funarg problem nor its solution exist in any language the average CS undergrad is taught (outside that course).

So no problem, teach Scheme, lambda expressions, they'll understand then, right? Well, no. Our study of Scheme was limited to fiddling with recursion (I think we wrote a function to flatten a list). Not even tail-call elimination (I didn't learn that until studying SICP after I graduated). No lambda, none. So when the theory was taught, it was just some bizarre thing that we had to learn for the test, with no practical use or example where it would even work given.

So here is one. I've thought about this a while, and I think I have an example of how closures are useful to the average business application. It's easy: using closures helps keep all related code in one place.

This is sort of the opposite of the stereotype, right? Normally, the static language prejudice is that using closures breaks up your functions, you can't tell what's where or what will happen next, right? You take a function as a parameter, it has code in it and who knows where it's defined?

Well, the important thing to realize is that that code has its own scope, and it's not yours. You call a function parameter, it doesn't matter where it's from because it can't affect your function in any way other than what you pass it and how you use its return. It's off in its own scope, separate. Which makes sense; it's what lexical scoping means: you can tell what's going on by looking at the text ("lex" means "text") of the program.

Got a little sidetracked there. Example time! You are doing that thing that business apps have done since time immemorial: reading rows from a database and doing something to them. Problem is, you have to support lots of different databases, and not all of them have handy JDBC drivers. Other problem is, you have to do this roughly every four lines in your Gigantic Enterprise Application, so it better not be too verbose.

So let's see, you obviously want to take a database connection (we'll assume you're handed that, it's just an object) and run a query on it. That gives you back some kind of result object you can call to get back row objects, and you can ask those for fields. You want a loop that goes through all the rows and does whatever it is you do in the body of the loop. Of course, afterward you have to clean up the resultset, or else things bog down, and you also have to do that if anything goes wrong in the loop body. So it looks like this:

function customersInRegion(connection, region){
    var results=null;
    try{
	results=connection.run("select * from customers where region="+region);
	var row=results.nextRow();
	while(row){
	    addCustomer(row.get("customer_name")); // This is the business logic
	    row=results.nextRow();
	}
    }catch(error){
	log.fatal(error);
    }finally{
	results.cleanup();
    }
}

Just so you know, this isn't hypothetical, this exact function is in our code at work roughly eleventy thousand hojillion times. I've pondered for months, and there is absolutely no way to get rid of it, or any of the couple dozen incorrect variations on it that bite us every now and then.

So let's think about the levels of badness here. Obviously, if the structure of this ever changes, if you need to do something like insert a "start transaction" before every query, you have to go change all of these. What we want is to split this. We want one function with everything that never changes, the while(row) bit, and we want one function that always changes, the addCustomer(row.get()) bit. So now we have:

function processRow(row){
    addCustomer(row.get("customer_name")); // This is the business logic
}

function customersInRegion(connection, region){
    var results=null;
    try{
	results=connection.run("select * from customers where region="+region);
	var row=results.nextRow();
	while(row){
	    processRow(row);
	    row=results.nextRow();
	}
    }catch(error){
	log.fatal(error);
    }finally{
	results.cleanup();
    }
}

This is a good way to look at where to add closures, by the way. Find every loop in your program, especially every repeated loop, and try to break it into a general pattern like "loop over every map hex in range of the tank".

So we're getting closer now. This still has a few problems. One is that query, obviously, we need to not be constructing it inside the function, but that's an easy fix. The other is that these two functions are separate, but still tied together. You can't have one customersInRegion function serve multiple processRow functions. So, we fix both of these problems the same way; we add parameters:

function processRow(row){
    addCustomer(row.get("customer_name")); // This is the business logic
}

function iterateOverRows(connection, query, processFunction){
    var results=null;
    try{
	results=connection.run(query);
	var row=results.nextRow();
	while(row){
	    processFunction(row);
	    row=results.nextRow();
	}
    }catch(error){
	log.fatal(error);
    }finally{
	results.cleanup();
    }
}

iterateOverRows(conn, 
		"select * from customers where region="+region, 
		processRow);

We're now passing the query in whole, and the loop body in whole. We can define the iterateOverRows function in a library somewhere and pass it different processRow functions from our own code. The code to deal with customers is in one file, the code to deal with databases is in another. You want to support a different database, write a new iterateOverRows function for it. Want to add logging? It's all in one place. Want to only add customers not named "Fred"? It's all in one place. Separation of concerns, this is called.

And the Java programmers are shaking their heads and fidgeting, because they think they can do this, even though I said a few paragraphs ago it was dead impossible. Inner classes, they say, and inheritance. So let's see how that goes. Or doesn't.

Everything in Java is a class. It's the first thing you learn about Java, and it never stops being true. So our hypothetical Java program looks like this:

class DatabaseReader{
    void iterateOverRows(Connection conn, String query){
	ResultSet results=null;
	try{
	    results=conn.runQuery(query);
	    Row row=results.nextRow();
	    while(row!=null){
		processRow(row);
		row=results.nextRow();
	    }
	}catch(Throwable t){
	    Logger.fatal(t);
	}finally{
	    results.cleanup();
	}
    }

    abstract void processRow(Row row);
}

class CustomerCounter extends DatabaseReader{
    List customers=new LinkedList();

    void processRow(Row row){
	this.customers.add(row.get("customer_name"));
    }
}

Little verbose there, but that's not news. We have a base class that has the iteration stuff in it, and a derived class that processes each row. Okay, all well and good, except... We only get one base class, so it's a little rude for the database library to grab it on every class that needs to read a database. Also, we might want to run more than one query in each class, which is tough since we only get one processRow override per class. Hm. We'd need to have processRow take in a dispatch parameter which says where it was called from (some method in CustomerCounter that calls iterateOverRows), and this method has to be passed blind through DatabaseReader which couldn't care less about it, but which must now mention it in three separate places. Moreover, we have to write a big switch statement in CustomerCounter.processRow.

I'm not trumping this up. Go look at the Javadoc for ActionEvent.getActionCommand().

So straight inheritance won't work. Now let's try the other trick up the Java weenie's sleeve: inner classes. You make a little class inside your class that you can pass around, and has a handle back to your class. Looks like this:

class DatabaseReader{
    void iterateOverRows(Connection conn, String query,RowProcessor proc){
	ResultSet results=null;
	try{
	    results=conn.runQuery(query);
	    Row row=results.nextRow();
	    while(row!=null){
		proc.processRow(row);
		row=results.nextRow();
	    }
	}catch(Throwable t){
	    Logger.fatal(t);
	}finally{
	    results.cleanup();
	}
    }
}

class RowProcessor{
    abstract void processRow(Row row);
}

class CustomerCounter extends DatabaseReader{
    List customers=new LinkedList();

    private class CustomerProc extends RowProcessor{
	void processRow(Row row){
	    customers.add(row.get("customer_name"));
	}
    }
}

And that works just as long as customers is a member. Make it a local variable and it fails, unless it's final. So we can't do this:

void main(String[] args){
    List customers=new LinkedList();
    RowProcessor proc=new RowProcessor(){
	    processRow(Row row){
		if(row.get("customer_name").startsWith("Fred"))return;
		customers.add(row);
	    }};
}

It's legal (it's called an anonymous inner class, and looks a lot like a lambda expression) but it's not a closure: it only sees local variables that are declared "final" (which is what Java calls "const"). So then it's pretty useless.

This problem is all over Swing, the Java GUI system. It's all over JDBC in the form of tons of exposed interface crap that people have to deal with. It's all over most Java code, in one form or another. Closures let you separate code that changes from code that doesn't. They let you keep things neat and tidy. They offer a way to compress repeated code structure, like loops offer a way to compress repeated statements or arrays offer a way to compress repeated data. And no popular language today has them, and most programmers are actually fighting adding them to those languages.

Isn't it infuriating?