Thursday, August 23, 2007

James Gosling, it turns out, thought harder about type inference than I did

So for a while, I'd been wondering about the type system in Java, thinking that perhaps I'd found a shortcoming -- if you have a more general type (say, "Thing"), and a more specific type (say, "Chair", which extends "Thing") -- then why isn't List<Chair> a subtype of List<Thing>, meaning that you could pass a List of Chairs into a method that takes a List of Things?

The mighty Toby R, in a conversation with me and Strick, shed some light on the situation and led me to understand why this is not the case. The particular use-case is: when you're in the method that takes List<Thing>, you can add Things to that list. And the Things you add could well be Basketballs. So when the method returns, the caller is still expecting to have a list of Chairs, not realizing that there are now Basketballs in the list... so Java cleverly disallows this case.

(also, somehow I missed an anonymous comment, probably from my mother, well-respected for her work on type theory, which explained that exact case)

Now in a purely functional language, where you don't go around getting references to objects and modifying them, I'd like to posit that this wouldn't be a problem -- but perhaps there are other problematic situations? ...

Wednesday, August 22, 2007

typing about typing about types

As we make the push to handle Java 1.5 features in GWT (it's about time!) I'm having flashbacks to undergraduate Compilers. Perhaps this is unsurprising, considering what I'm working on. Part of the difficulty in the problem is that whether you're talking about types in the code that constitutes the compiler itself, types in the code that the user is going to put in, or types in the output code... we use the same word. Oh my. (much worse, I suppose, is the code you get with the Appel book, where half of the words on the screen are helpfully "ty" or "typ". Jebus.)

But! All of this means that by the next GWT release, you'll likely have some code that I touched in your hands. Woo.

Sunday, August 12, 2007

this blog post: for you, $50.

I recently met a fellow who's working on a doctorate in literature, but his previous background is in Library ScienceW. I wasn't sure what the interesting problems in library science might be, so I wandered over to the wikipedia article and started falling through the links.

A few links out, I ran into the Serials Crisis article. Apparently (and Wikipedia articles close to the "Library Science" one are never wrong), the costs of subscribing to scholarly journals keep on going up -- libraries only have so much money for subscriptions, but there are ever-more academics and subfields, thus more journals. And if a given library cancels its subscription from a particular journal, that publisher's fixed costs are still fixed, so prices increase for the remaining subscribers.

The traditional academic journal system had seemed pretty shaky, especially in light of the Web; upsetting publisher websites (Springer, ACM Portal, IEEE's site...) seem like their sole purpose is to keep the enterprising students from reading an article. In light of how most of the science behind the articles is publicly funded in the first place, the articles seem like they should be public as well.

I wouldn't mind seeing companies like Springer just going away; universities seem totally capable of hosting journals -- over the web especially! There may be some compelling reason for the current system, and I'll try to find it out... but for the short term, tools like Google Scholar could go a little further out of their way to help us find the full text of an article!

Also:
http://en.wikipedia.org/wiki/Open_access
http://en.wikipedia.org/wiki/Open_access_journal
http://en.wikipedia.org/wiki/Open_access_publishing
The Serials Crisis: A White Paper for the UNC-Chapel Hill Scholarly Communications Convocation
The Crisis in Scholarly Publishing

Friday, August 10, 2007

You get spoiled by languages with first-class functions

Earlier today, I needed to take a list of strings and grab all of them that end with ".html". Very simple task. My first thought is of course something like:

[fn for fn in fns if fn.endswith(".html")]

Or even:
filter( lambda x: x.endsWith(".html"), fns)

Or the equivalent Lisp. Y'know, with "loop" and "collect". (Common Lisp has, as they say, the Cadillac of loop syntax)

Or something to that extent. But then I remembered that I was writing in Java, and I ended up writing a for-loop. There's no clean, idiomatic way to say that in Java, is there? Do you have to build up the list procedurally? ...

Sunday, August 05, 2007

"If I knew the answer ahead of time, I wouldn't be writing this program!"

On the plane back from San Francisco to Atlanta, I read Thomas and Hunt's Pragmatic Unit Testing. It's a very quick read, about 120 pages plus appendices. And while it's very light, it has what seems like a lot of helpful advice for good design and development practice. To be honest, I've never done much unit testing in the past, thinking "oh, that's what those corporate software engineer guys do -- pssh." But! It turns out that these days, I am a corporate software engineer, and anyway one should always be looking out for new ways to hack more effectively.

There's a bunch of important principles to take away from Pragmatic Unit Testing. The one that stuck with me the most, though, is that when designing and writing code, you've got to think: "how am I going to test this?". The significance of this is not just "oh man, I'm going to have to write a unit test for my method"; it gets back to that generally-understood but oft-ignored idea that every logical unit of your code really wants to be its own method, a modular thing that you can use separately -- because you're going to have to use that same calculation again somewhere else. And Don't Repeat Yourself.

For example: if you have a method that calculates how to do something and then does it (say with a call to another library), maybe you want to make that two methods: the calculation and then a call to that calculation coupled with the library call. This will be easier to test -- you're not really interested in testing a third-party library, just your own calculations -- and as a happy side-effect, your code is now cleaner and more reusable!

Similarly, separating out the backend code from the GUI (two of the things that get my hackles up the most in this life are terms "business logic" and "MVC") lets you properly test the Stuff That Does Stuff on its own. One example in the book hit particularly close to home -- a small GUI application where all the caculations and I/O happened mixed in with the Swing code.

That example, and in fact the whole book, brought on flashbacks to a project I'd worked on recently. One of my friends and I (and he's one of the sharpest guys I know) inherited a fairly involved program and ended up sinking months trying to fix it up. This system suffered from pretty much every pitfall in the book: random silently-caught exceptions, real work happening mixed in with the GUI code, needlessly long, opaque, nigh-untestable (let alone "tested") methods, repetition all over the place. Worse! It was built by a guy who'd supposedly specialized in software engineering -- and he did pretty much everything that Thomaas and Hunt warn against! Not a pleasant situation. I'm sure you can relate.

Of course, I knew at the time that this was atrocious code. But now maybe I'll be a bit more principled in my development, working with a lean towards easy testability. Unit testing will probably be a good discipline to get into.

On the other hand, I've been reading about (and writing a bit of) Haskell. All this murky business of setting up and tearing down state, "proving" to yourself that each function does what you think it does "for the boundary cases" -- it all relies on the idea that you're going to be able to predict where you're going to make the bugs (by heuristic, habits, and mnemonics). And if you're smart enough to predict where the bugs are going to pop up, it seems like there's something better you could do to keep them from being introduced at all. In ML for the Working Programmer, L.C. Paulson suggests that a mark of the professional in the future will be writing functionally (in ML). The modularity practices seem like the Right Thing, useful for the functional programmer as well as the OO, but if your code could be formally verified, how much more confident would you be that it was correct for the general case? Simple testing is nothing like a proper proof.

But who ever writes proof-carrying code?

Saturday, August 04, 2007

further readings: one day, I'll have something coherent to say about type systems

The first finder of any error in my books receives $2.56; significant suggestions are also worth $0.32 each. If you are really a careful reader, you may be able to recoup more than the cost of the books this way.

However, people who have read the book Eats, Shoots & Leaves should not expect a reward for criticizing the ways in which I use commas. Punctuation is extremely important to me, but I insist on doing it my own way.

-- Don "The Lion" Knuth, The Art of Computer Programming homepage.

*laughs* We love you, Don.

Also: there's so much to read in this life. I've recently picked up books on proper C++ technique, security, and proper unit testing, and I'll probably give those some priority on my ever-growing Queue. Of course, there's all those books on statistical NLP that need to get read in the near future if I'm going to be of any help to anyone.

I've been in on a few conversations recently about C++ and pitfalls and bugs that can arise when using it. The more I think about these, particularly random language-specific casting rules and several different competing ways to represent strings, the more I think that Haskell is going to be a good idea. Or possibly SML.