Tuesday, April 13, 2010

my dialect and subjunctive verbs

I got corrected on subjunctive verb usage today, which I found odd enough to blog about.

My utterance was something like: "If it was..." -- and I got interrupted with "if it were". My immediate response: "My dialect doesn't do that."

Rudeness aside, what would you say, and under what circumstances? What are the odds, do you think, that my interlocutor has the "if I be..." construction?

Saturday, March 27, 2010

from Dan Dennett: "Preachers who are not Believers"

From a recent article by Dan Dennett and Linda LaScola, "Preachers who are not Believers", which I saw on the International Cognition and Culture Institute blog.

They managed to reach out and find some actual pastors of churches, who happen to not believe much of what their church-people probably expect they believe, and interview them. They turn out to be really thoughtful, compassionate people. With jobs that must be pretty difficult at times. Awesome paper.

From Jack, age 50, a Southern Baptist minister of 15 years:

“OK, this God created me. It’s a perfect God that knows everything; can do anything. And somehow it got messed up, and it’s my fault. So he had to send his son to die for me to fix it. And he does. And now I’m supposed to beat myself to death the rest of my life over it. It makes no sense to me. Don’t you think a God could come up with a better plan than that?”

“What kind of personality; what kind of being is this that had to create these other beings to worship and tell him how wonderful he is? That makes no sense, if this God is all-knowing and all-wise and all-wonderful. I can’t comprehend that that’s what kind of person God is.”

“Every church I’ve been in preached that the Jonah in the Whale story is literally true. And I’ve never believed that. You mean to tell me a human was in the belly of that whale? For three days? And then the whale spit him out on the shoreline? And, of course, their convenient logic is, ‘Well, God can do anything.’”

“Well, I think most Christians have to be in a state of denial to read the Bible and believe it. Because there are so many contradicting stories. You’re encouraged to be violent on one page, and you’re encouraged to give sacrificial love on another page. You’re encouraged to bash a baby’s head on one page, and there’s other pages that say, you know, give your brother your fair share of everything you have if they ask for it.”

“But if God was going to reveal himself to us, don’t you think it would be in a way that we wouldn’t question? ...I mean, if I was wanting to have...people teach about the Bible...I would probably make sure they knew I existed. ...I mean, I wouldn’t send them mysterious notes, encrypted in a way that it took a linguist to figure out.”

Sunday, February 21, 2010

Morphology: derivation and inflection

Morphology is the study of the structure of words, and the processes by which they're produced. Within morphology, we talk about (among other things), derivation and inflection. It took a little bit of reading for me to understand the difference between the two, so hopefully I can explain it to you.

Derivation is when new words, typically of a different part of speech, are produced from existing words. In English, we have quite a few affixes to change a word's the category, and interestingly, they're not very regular. To make something black, you blacken it, but to make something hollow, you hollow it. In Calvin and Hobbes, Calvin uses verb as a verb that means "turn something into a verb". To make something into a product, you can (recently) productize it. What word can you use to make something blue? Have you ever bluified something?

Inflection is rather simpler. It takes a base form of a word and encodes some extra meaning in it -- what extra meaning varies by language, but it's often things like plurality or gender. Typically the language requires that words be inflected properly.

Languages differ pretty broadly how much information a given inflected word carries. For example, a verb in Spanish carries more bits than one in English, so in Spanish ("Hablo castellano.") you often don't have to specify the subject of a verb, because its inflection makes it clear who you're talking about [0]. Some languages encode quite a lot of information in one verb: maybe its object, the whole tense (so no need for modal verbs like "would" or "haber"), the genders of all the participants, maybe even how the speaker came to know the information in question [1].

I have a whole lot of linguistics to learn. It's interesting being around a department where a lot of the people are linguists by background, when I've only put so much time and attention into it. So you'll get more posts like this, rest assured.



[0] This feature, of not having to specify pronouns, is called being a pro-drop language. Some languages can drop more pronouns than Spanish!

[1] This feature, grammatical evidentiality, is extremely awesome, and we need to adopt it in English.



References:
http://www.indiana.edu/~hlw/Derivation/intro.html
http://en.wikipedia.org/wiki/Inflectional_morphology
http://en.wikipedia.org/wiki/Evidentiality
http://en.wikipedia.org/wiki/Inflection#Inflection_vs._derivation
http://en.wikipedia.org/wiki/Pro-drop_language

Sunday, February 07, 2010

drawing trees with LaTeX

So, whatever you think about producing documents with LaTeX (personally, I'm pretty ambivalent, for reasons that I may go into later), if you want to draw parse trees with it, there's a nice package to do that: qtree.

Installing packages from CTAN manually looks hard.

If you're on Ubuntu/Debian, though, you just need to install these two packages: texlive-humanities, texlive-pictures.

qtree itself is in texlive-humanities, but it depends on a package, pict2e, that's only in texlive-pictures, so you have to install them both or it won't work.

And then you can draw some trees just by specifying the bracketing of the phrases (see the qtree docs for exactly how).

Friday, January 08, 2010

review: a new Model M from Unicomp!

My favorite keyboard is the Model M. They're big, loud, heavy, and made of equal parts joy and engineering. Typing on one makes the familiar clattering racket that everybody loves.

Lindsey just gave me a new one! My 1988 version (IBM part #1391401) is still fine, of course. But now I can bring one to the lab.

The new keyboard is beautiful; they're making them with USB now, and they come in black! It's not quite as heavy as my 80's vintage keyboard (no big metal plate inside), and while the keys themselves are easily removable, this model doesn't have separate keycaps. But it's just as clicky as you remember, and the feel is perfect. This design is apparently the same as some of the latter-day IBM versions.

The company manufacturing the M now, Unicomp, is great, and they totally deserve your business.

The first keyboard they shipped us actually had some problems with it -- a few of the keys were sticking! So I called up the company and got Jim on the phone almost immediately. He suggested that I pull the offending keys off and then pop them back in place (usually good M maintenance advice). After some fidgeting, we determined that I wasn't going to be able to fix it myself, so he had a replacement sent out the very next day!

So fantastic. And now I have two Ms.

(here's another Unicomp review; the blogger and everybody in the comments over there seems to have had a great customer service experience too.)

Sunday, January 03, 2010

Guns, Germs, and Steel on invention

For my holiday break reading, I just finished Guns, Germs, and Steel, by Jared Diamond. I heartily recommend it! It's about the broad patterns in human history: lots of it is about the development of agriculture ("food production", he usually calls it), how and why it happened where it did, and the historical ramifications as societies develop and come into contact/conflict with other societies.

There's a lot about germs, too. The diseases that a society carries and develops resistances to are extremely important when running into another group. A people can be totally wiped out, faced with a disease it's not accustomed to.

But I wanted to share with you a bit that particularly resonated with me, as a technology-producing person.
Thus, the commonsense view of invention that served as our starting point reverses the usual roles of invention and need. It also overstates the importance of rare geniuses, such as Watt and Edison. That "heroic theory of invention," as it is termed, is encouraged by patent law, because an applicant for a patent must prove the novelty of the invention submitted. Inventors thereby have a financial incentive to denigrate or ignore previous work. From a patent lawyer's perspective, the ideal invention is one that arises without any precursors, like Athene springing fully formed from the forehead of Zeus.

In reality, even for the most famous and apparently decisive modern inventions, neglected precursors lurked behind the bald claim that "X invented Y." For instance, we are regularly told, "James Watt invented the steam engine in 1769," supposedly inspired by watching steam rise from a tea-kettle's spout. Unfortunately for this splendid fiction, Watt actually got the idea for his particular steam engine while repairing a model of Thomas Newcomen's steam engine, which Newcomen had invented 57 years earlier and of which over a hundred had been manufactured in England by the time of Watt's repair work. Newcomen's engine, in turn, followed the steam engine that the Englishman Thomas Savery patented in 1698, which followed the steam engine that the Frenchman Denis Papin designed (but did not build) around 1680, which in turn had precursors in the ideas of the Dutch scientist Christiaan Huygens and others. All this is not to deny that Watt greatly improved Newcomen's engine (by incorporating a separate steam condenser and a double-acting cylinder), just as Newcomen had greatly improved Savery's.

Saturday, January 02, 2010

Foundation Beyond Belief launches!



The new website for the Foundation Beyond Belief is up! The mission is: "To demonstrate humanism at its best by supporting efforts to improve this world and this life; to challenge humanists to embody the highest principles of humanism, including mutual care and responsibility; and to help and encourage humanist parents to raise confident children with open minds and compassionate hearts."

Foundation Beyond Belief is a non-profit, charitable foundation that wants to encourage compassion and charitable giving for [secular] humanists. It's also working on providing support and education for non-theistic parents.

However you might feel about churches, one thing that they're good at is charity and volunteer projects. You're big-hearted and well-meaning -- but do you have somebody reminding you to volunteer for Habitat For Humanity and donate to feed the homeless every week? Apparently in the US, religious people give more to nonprofits than non-religious (according to this guide from Mint, via FriendlyAtheist).

That's what FBB is for. With FBB, you can make one-time donations, or sign up for monthly giving, and you choose how your donation is distributed! Contributions are tax deductible, and go 100% to the organizations benefited! (you can also choose to donate to FBB itself, which of course has operating costs)

There's an online community, etc! Pretty exciting!

Wednesday, December 30, 2009

here come some explanatory examples

Having recovered from all that fruitcake and holiday cheer, I've started digging through the code I wrote over the past semester, looking for things that could be straightforward examples.

So far, I've got:
  • calculating the entropy of a discrete random variable
  • a cute implementation of finite-state automata with matrix multiplication
  • calculating the probability of a Markov process going to a particular state, again with matrix multiplication
  • a simple CYK-style chart parser for probabilistic grammars (computing inside probabilities, outside probabilities, and the most probable parse)
  • a parse evaluator that gives precision and recall for parse trees
  • probabilistic part-of-speech taggers that take into account bigrams, both by trying all combinations of tags for the words and using the Viterbi algorithm
  • Some pretty clean code for hidden Markov models in general

I've already checked these in over on narorumo. They're all in Python, but some depend on nltk or numpy.

They'll be increasingly clean and documented over the next week or so. I hope these are helpful to somebody!

Monday, November 16, 2009

explanatory power of working examples

The NLP algorithms I've been studying since I started back at school aren't particularly complex. But they're often described with really dense notation: maybe your field does this too! Here's a description, for example, of how to calculate an "outside probability" -- it's the (joint) probability that a particular nonterminal symbol covers a certain chunk of text, and the words outside the span of that nonterminal. This is from Fei Xia's lecture slides (and I think these are pretty good).



Maybe what I need is more practice picking apart dense notation, but in all honesty I have trouble keeping track of what the different letters mean. Maybe a nice dynamic programming implementation springs to mind for people smarter than me, but I have to stare at it (and the surrounding slides) for quite a while!

I think I'd be making a pretty good contribution to the world if I took the algorithms I'm learning and wrote down the most straightforward pseudocode and prose versions I can, with a running Python implementation and descriptive variable names. Surely many people out there would find code easier to digest!

Somebody's already done precisely this with the Viterbi Algorithm wikipedia page, and I'm very grateful to that somebody.

Wednesday, November 04, 2009

Lenovo: you have to buy Windows, As Per Policy

I got a pretty quick response from the Lenovo sales people -- complete with verbiage at the bottom emphasizing how the email was confidential and legally privileged, and any retransmission, dissemination, or other public use is strictly prohibited. They should have put the EULA for the email at the top, before I scrolled down! I might not have agreed to read the email! Geez, or worse, what if somebody accidentally read it over my shoulder in a coffee shop!

Anyway, they said:
We do not have option to sell any unit without operating system as per policy.

So I guess I won't buy a ThinkPad. I'm just not willing to pay The Microsoft Tax when I'm not going to use Windows.

Python generator expressions

I just found out about this: Python has a really concise way to make new generators.

It looks like a list comprehension, just without the brackets. Before I knew about this feature, the code I was reading looked pretty mysterious.

There are some nice examples of cases where you might want to use this sort of thing in the relevant PEP. Especially pleasant uses from the PEP include passing a generator to the dictionary constructor, like so:
d = dict( (k, func(k)) for k in keylist)

... and, useful for me personally, getting the set of words in a file, all in one go:
s = set(word for line in f for word in line.split())

Good to know!

Sunday, October 25, 2009

trying to buy a ThinkPad without paying for Windows

I got that Dell Mini 12 some time ago, and honestly it's been a pain. It's a good-looking machine, and the keyboard and screen are nice, but the Poulsbo chipset just has terrible Linux support. Like every third time I get an update, something breaks, and I haven't been able to make it suspend/resume reliably in months -- oh, and X just broke again. What I really want is a little ThinkPad.

So I just sent this email to Lenovo:
Hey, good evening,

I'd love to purchase a ThinkPad X200. I haven't found the option on your website, though, for how I can buy one without Windows? Could you point me to that link?

I'm simply not going to use Windows; I would install Linux on it as soon as I get the machine anyway, and I don't want to pay for software I won't use.

So if you can sell me a ThinkPad with no Windows, that would be fantastic, and I'll be really happy and gladly give you money and say nice things about your company.

Thanks very much!

--
-- Alex Rudnick
We'll see what they say! I might just buy the laptop anyway, not agree to the Windows EULA, and then go through the hullabaloo to refund it.

Sunday, October 18, 2009

constraint programming in Python

You may be familiar with constraint programming, an approach where, instead of describing how to solve a problem, you describe what a possible solution looks like, and let a generalized solver find possible solutions. This is the sort of thing you might do with Prolog, Oz, or any number of libraries for your favorite programming language.

If your favorite programming language is Python, there are at least two different libraries for this approach! Unfortunately, they're both called "python-constraint"; this led to some confusion on my part. Here they are:

logilab-constraint. This is packaged in Debian/Ubuntu as "python-constraint". It's put out by the French company LogiLab, who contribute a bunch of Free Software useful for doing AI-flavored things. Their HMM library is pretty slick too.

python-constraint is a package by Gustavo Niemeyer, and it's got this really nice tutorial.

I mention these because my new research group is using this latter one to build a dependency grammar system based on Ralph Debusmann's XDG.

And more about that, as we get to it :)

Monday, October 12, 2009

normal distributions and R

When I'm using R to do statistical things (such as homework), I feel somewhat torn -- it's got so many nice functions that come built in, but the language itself is slightly clunky, and integrating code that I've written in R with bigger projects seems like it would be kind of a pain. That's a general problem with picking any special-purpose language, though -- I might make similar complaints about Matlab/Octave or even Prolog...

I note, though, that I haven't jumped ship to NumPy yet.

pnorm and qnorm


I just wanted to mention these fantastically easy-to-use functions that come built right in: pnorm and qnorm.

pnorm is what you use if you have a z-score and you want the probability that a value in the distribution would come up as less than that score. This is equivalent to looking up probability values in the "z" tables in the back of your stats book. pnorm(0) gives you 0.5, since half of all values are going to have a value less than 0.

qnorm does the inverse -- you give it a probability and it gives you back the z-score below which that much of the probability mass lies. So if you give it 0.5, it gives you back 0.

Both of these functions can take more parameters -- you can specify your distribution mean and stddev (so you don't have to use z-scores), for example. Type "?qnorm" for the docs!

Tuesday, August 18, 2009

ICFP programming contest 2009!

Not too long ago, Lindsey Kuper and I stepped into the ring once again to compete in the ICFP Programming Contest!

She's writing up the full story over on Geek Buffet.

The quick recap, however! The problem had to deal with orbital mechanics -- we were to control a simulated satellite as it orbits a simulated earth. First, you just had to transfer orbits, then meet other satellites, and it got increasingly complex from there.

Thankfully, the physics simulator for each kind of problem was included. All you had to do to use it was implement the contest-specific VM! (Easy, right?) Thankfully, the specification for the VM was super-clear, and we got it working surprisingly quickly.

By the end of the contest, we could handle the first two problem types, and that was competitive enough for 120th place worldwide, by the morning when they turned off the leaderboard. Good show, us!

For the full scoop, go read Lindsey's account.

Here's the code! We used Scheme and Python, and R for the visualizations.

Tuesday, August 11, 2009

change of scenery and reviewing conference submissions

So you may not have heard yet, but last month, I left Google Atlanta, packed up my cats and my computers, and headed to Indiana University. Lindsey Kuper helped quite a bit, both in motivation to do this and in the actual moving process. So I'm a PhD student now, very exciting! I'll be working on natural language processing and machine learning -- something I'd wanted to focus on for quite a while.

But in my last two weeks at the Goog, I had this really interesting opportunity, presented by my colleague Katharina Probst. She's a reviewer for the Conference on Information and Knowledge Management (is this the same as being on the program committee?), and asked if I wanted to help. It seemed like good practice for my upcoming stint as an academic.

So we (I, with her guidance and sanity-checking) had this pile of eight papers to get through. Some were fairly mundane, like learning classifiers to determine if a message is a flame, or summarizing a group of documents; one was particularly targeted at finding references to rabbinic literature in other rabbinic literature (they don't use ACM-style citations, typically). And so on. A few were written very clearly and had well-motivated discussions on why the problem is interesting and important, and others... not so much. This is why we have peer review.

But it was an interesting experience, and I'm glad I took it. My tech lead, Miguel, graciously let me use my "20%" time to Advance Science and just review papers for a day. I wanted to do a good job of reviewing, so I read the papers really closely, took a lot of notes, and wrote a few paragraphs in response for each one. Katharina at least seemed to think that it was good feedback, so that was reassuring. (although I would have liked more feedback on my feedback).

Once the other reviewer's ratings came in, I was fairly pleased to see that my ratings weren't far off from the other reviews. If I missed some amazing gem of wisdom, then at least it was apparently hard to find -- the papers I liked the best were accepted. I was more concerned, honestly, that I would mistake some stale old idea as a clever new one. But again, that's why we have many eyes on these things.

Alright! So now I just have to produce some stuff for other people to review. To the lab!

Saturday, August 08, 2009

goto: a utility for bash

Do you ever find yourself, while using bash, wanting to get to a particular file that you know is way down in a big tree of files -- maybe a big Java source tree, or some other nested file structure? You know you want Foo.java, but is it in src/com/foocorp/package/a/b/c, or maybe somewhere else?

Announcing goto, a bash utility which solves just that problem! Now you can just say:

you@computer:~/project$ goto Foo.java
you@computer:~/project/src/com/foocorp/whatever/path/to$ vim Foo.java


... instead of whatever business you were going to do with locate or find or your IDE, or just manually sifting for it.

Here's the README. Comments, complaints, and patches welcome! (and if you find this useful, I'd be really pleased if you'd let me know!)

Friday, August 07, 2009

More Ubuntu: getting the NetworkManager Applet back

Ubuntu comes with this really great network selection applet that sits on your gnome panel. It's called the NetworkManager applet, and it looks and acts more or less like the analogous dropdown menu on Mac OS X. Unfortunately, if you remove it from your panel, it's unintuitive how to get it back.

Maybe you accidentally removed it, and then, after some research, tried "nm-applet" from the command line, messed with the Network Monitor and the Network Connections preference page, and even tried editing nm-system-settings.conf.

What you really want to do is right click on your panel, click Add to Panel, and select Notification Area. Why the nice NetworkManager Applet and battery status applet both live in the Notification Area remains a mystery.

Like so:

Thursday, July 09, 2009

Why your Ubuntu Netbook Remix windows are maximized all the time

I was curious why my windows were all immediately getting maximized on my new kinda-netbook Dell Mini 12, running UNR (with the "Classic Desktop" mode). It's the sort of behavior you can imagine wanting -- with a small screen, maybe you do want to maximize everything? But I wanted to turn it off.

This thread has the answer!

UNR comes with a program called "Maximus", which was in my Startup Applications. To make the behavior go away, just go to System > Preferences > Startup Applications and remove Maximus from that list. Problem solved!

Seems odd that this is a process running in the background, instead of, say, a window manager setting.

Monday, June 01, 2009

back from Google I/O, got a new laptop!

I can speak for myself, at least, in saying that it felt pretty darn good to be on the GWT team for Google I/O. The Google Wave demo at the keynote got a standing ovation, and they made a pretty big deal about how they used GWT and it was a wonderful tool for their work. Pretty fantastic. A standing ovation at a tech conference. Totally electric.

Anyway! I got back, and my Dell Mini 12 with Ubuntu preinstalled had arrived! It's really pretty. The keyboard is a little more cramped than I'd expected (especially with the punctuation keys) but it's got a nice clicky feel. I'm not super-impressed with the Ubuntu version Dell shipped -- it's a specialized distribution of Hardy (last year's version) with some Dell and Yahoo-specific goodies. But the built-in camera and suspend/resume work beautifully! And no Microsoft tax! Not bad, really!

I'm probably about to install a fresh new (9.04) Ubuntu Netbook Remix on it.

Update! There's a bit of a funky driver issue with running the stock Ubuntu instead of the one Dell provides, but it's very surmountable. Here's how to install Ubuntu "Jaunty" on your Dell Mini 12.

Wednesday, May 20, 2009

Scala article on the Scala site!

I helped Lex Spoon and Toby Reyelts write an article about running Scala on App Engine.

Here's the article! As of right now, it's linked from the front page on the Scala site.

In the article, we mention my Scala/GWT sudoku solver, which is now available on Google Code. It's also live on App Engine right now!

Tuesday, April 21, 2009

Further steps: Scala/GWT/App Engine/Eclipse

I've been excited about Scala recently, for a number of reasons -- it's a modern language with lots of great features that you might expect if you've been looking at Haskell or ML. It's got pattern matching, a concise syntax, type inference, and first-order functions. The really killer thing about Scala, though, is how well it integrates with Java.

You can very easily have a mixed Scala and Java project, and make calls back and forth; Scala and Java packages sit in the same package hierarchy and compile to the same bytecode -- it all just works, pretty much seamlessly.

So, clearly, the right thing to do is hook Scala up to GWT RPC and run it on App Engine. Let's do that. This is as awesome as it's going to get, at least until the GWT compiler supports Scala and we can do our client side in Scala too.

I assume you've already installed both the Google Pluin for Eclipse and the Scala IDE for Eclipse. I'm using Eclipse 3.4.

Make a new "Web Application" project (with File > New, or just click the blue "g" icon). Click the boxes for both "Google Web Toolkit" and "App Engine" -- our client side will be in GWT, and the server on App Engine.

Right now, our new project has "Java" nature, but not the "Scala" nature. Add Scala by right-clicking the project and choosing Scala > Add Scala Nature. We're also going to make sure we have a copy of the Scala runtime library once we deploy to the server. Find out where scala-library.jar sits (expand out the "Scala Library" container in your project) and copy it into the project's war/WEB-INF/lib.

Now, down to business. There's a RemoteServiceServlet currently implemented in Java, and we're going to replace it with Scala code -- this is the thing that gets called on the server side when the client makes an RPC request. Find GreetingServiceImpl.java and delete it.

To replace it with a class implemented in Scala, right-click your "gwtscalademo.server" package and select New > Other > Scala Wizards > Scala Class. The class we deleted was a servlet and referenced by the web.xml, so give the Scala class the same name: GreetingServiceImpl.

Here's my version:
package gwtscalademo.server
import gwtscalademo.client.GreetingService
import com.google.gwt.user.server.rpc.RemoteServiceServlet

class GreetingServiceImpl
extends RemoteServiceServlet
with GreetingService {
def greetServer(input:String):String = {
return "Hello from Scala, " + input + "!"
}
}

Note that while in Java, we'd have said implements, in Scala we say with. As I understand it, this means that Java interfaces get mapped onto Scala traits. Hip!

Now you should be able to run the app locally -- right-click your project and select Run As > Web Application. Once you've learned Scala and made it do something interesting (you're on your own there), you can deploy it to App Engine!

Woo!

Wednesday, February 11, 2009

HOWTO not get "/usr/bin/env: bad interpreter"

If you get this:

bash: ./my-script.py: /usr/bin/env: bad interpreter: Permission denied

The partition your script lives on may be mounted with the "user" option set. "user" implies "noexec" (see the manpage for "mount"), which is going to keep you from running executables. And while running a binary executable from this kind of partition fails more clearly, trying to run a script with a shebang gives you this more confusing error message.

To fix! Add "exec" after your "user" flag in /etc/fstab. (again, see "man mount").

Saturday, January 17, 2009

Lisp insights from learning Logo

So I was just writing a simple recursive procedure in Logo. I wanted to go down a list, cons up a new list, and return the empty list when I get to the end.

My first pass, once I found out how to return something from a function (Logo procedures aren't expressions -- you have to explicitly return) had a line like this:

if (stacks = []) (output [])

Running the procedure gave me the helpful error message:
output didn't output to if in astep
[if (stacks = [] ) (output [] )]


Pardon?

When I remembered that Logo's "if" wants a list of instructions, it clicked with my new nugget of knowledge that literal lists are quoted in Logo. Then I understood what was going on: this "if" is neither a macro (like in Scheme, or most languages) nor lazily evaluated (like Haskell). It's a regular function, but one that expects quoted code to evaluate on demand at runtime. Rubyists and Smalltalkers: this is something like how an if block works for you, yes?

Here's an analogy into Python, for the non-Lispers out there:

def myIf(tf, truecode, falsecode):
  code = {True : truecode, False : falsecode}
  return eval(code[tf])


>>> myIf(0 == 1, "'it was ' + 'true'", "'it was not true'.upper()")
'IT WAS NOT TRUE'


>>> myIf(0 == 0, "'it was ' + 'true'", "'it was not true'.upper()")
'it was true'

Also, apparently Logo in general has dynamic scope -- which shouldn't surprise me, since lexical scope is relatively new in Lisp. And Berkeley Logo in particular has macros. Maybe my little compiler project is going to be harder than I expected. Honestly, I was imagining I'd be done if I could just parse it, do some simple transformations on the tree and spit JavaScript.

Sunday, January 11, 2009

my code has a side effect: learning!

So I'm working through the exercises in Real World Haskell. Here's what I've got so far. As of this writing, I'm in the middle of chapter 4. RWH totally deserves all the buzz it's been getting; it's very approachable and well-written, and my Haskell is improving rapidly. The book is still free online, but I bought the printed copy, in large part because Bryan O'Sullivan gave such a great talk at OSCON.

My earlier Haskell project had stalled out due to my not being very good at the language yet. What I want to do is build a compiler for Logo that produces JavaScript, with the turtle graphics on a canvas tag. Logo is an acceptable Lisp! I'd love to see it get more use.

Interestingly, though:

Sunday, November 23, 2008

Fixing your Ubuntu suspend/resume problems

After some recent update, my Ubuntu box (currently "Intrepid Ibex") started having a problem where the networking wouldn't come back up when waking from suspend mode. My machine uses the "forcedeth" ethernet driver (quite a name \m/ ).

The fix! Mentioned tersely over here, and slightly more clearly over here, all I had to do was create a file /etc/pm/config.d/01-modules that contains this line:

SUSPEND_MODULES="forcedeth"

Apparently you can name the file something else, as long as it's in the same directory, but this one worked for me. And of course, if you're having trouble with a different driver, change "forcedeth" to the right module name.

It's 2008 -- why is power management still tricky on Linux? Users (like your non-technical family members) should never have to do this sort of thing.

Sunday, November 16, 2008

I could have just emailed them a photo of my shelf

I seem to have just spent 20 minutes flipping through Amazon's book recommendations, finding books that I already own and checking the box that says so. This, of course, made other books that I already own pop up, which I then clicked. And it was kind of fun. Thoughts that ran through my head included "I have the first edition but not this latest one, does that count?" and "Lindsey has that one -- that's almost like I have it, right?"

Excellent design, Amazon dudes and dudettes. I don't begrudge you that training data at all.

Thursday, October 09, 2008

Debian on the OLPC

I should go to bed, but I'm having so much fun setting up Debian on my XO laptop.

It was super-easy getting it installed (instructions here); you use the same command for updating to a new version of the standard olpc software and for installing Debian. Or Edubuntu, apparently.

For extra disk space, I blew away the default Sugar/Fedora install -- it's pretty easy to get back to a fresh state, just in case I want to set that up again later.

In 472 megs, I have the base Debian, X.org, subversion, vim, fluxbox, mzscheme, and the Glasgow Haskell Compiler, the install for which pulled in gcc, so was big.

Sunday, September 14, 2008

sudokudlx: Automatic Sudoku Solving with Dancing Links

In case I hadn't shown you yet, here's my Python implementation of a Dancing Links sudoku solver, on appengine!

http://sudokudlx.appspot.com/

Also, I've written a bit about how to use Dancing Links (DLX) to solve sudoku puzzles. Soon, I'll write up how to implement DLX.

Sunday, September 07, 2008

plats is Swedish for "locale".

My first pass for solving the problem from a few days ago is now live on appengine.

Check it out: http://plats.appspot.com

Just include a script in your page and you get a variable or a meta tag (for GWT apps) to tell you which locale you should be in. Details here.

Thursday, September 04, 2008

figuring out the user's language preferences, client-side

Your browser sends an HTTP header field called "Accept-Language" (see 14.4) when you make a request to a web server. It tells the server which languages you prefer, and in which ordering, so the server can send you localized content with graceful fallbacks. You can muck with these, for example, in your Firefox settings under Preferences > Content > Languages.

So I'd like to do this on the client side. I had the bright idea that I would just build an XMLHttpRequest object and ask it what headers it was about to send (XHRs do send this field), but GWT's HTTP objects don't let you get at request headers that you haven't explicitly set. The very good reason for this is that XMLHttpRequest objects don't let you query their headers. Oh noes.

It may be that the way to do this is with a bit of server-side cleverness, but that's not the answer I want. Unless somebody happens to have a way to snag these settings from JavaScript...

Friday, August 29, 2008

GWT 1.5 release!

Wooooo, new version of GWT! This is huge. We've been working really hard on this for quite a while.

http://code.google.com/webtoolkit

What's new? Check check check it!

Monday, August 18, 2008

Firefox 3: you can drag tabs between windows!

That's a pretty neat feature: drag a tab from the tab bar on one Firefox window into another one. You can also drag them into your bookmarks toolbar.

I'd like a way to grab hold of a tab and drag it out into space, forming a new window, but this might be hard to implement -- some systems would make a link on the Desktop. Failing that, an "open this tab in new window" command would be nice.

I could put in a feature request, I suppose -- or even a patch! How hard could that be to implement?

Tuesday, July 08, 2008

Lisp snippets on a Tuesday night

The ICFP contest is coming up, and Lindsey Kuper and I have been building our Scheme muscles. I was looking for an implementation of hash tables (or just some quick way to do a map), and I ran across the SRFIs -- Scheme Requests for Implementation. It's a semi-standard library of useful stuff for Scheme, and it's bundled with recent PLT Schemes! You can just say: (require srfi/n) to load up the nth SRFI; they're numbered. Particularly, I've been playing with the extended libraries for lists, strings, and hash tables.

On the Common Lisp side of things; I just ran across the insanely useful describe, which prints out what the system knows about a given object -- if it's a function with a docstring, the docstring will be in there. Also potentially useful: disassemble. Give it a function; it does what it sounds like!

Monday, June 09, 2008

reconsidering applets

Applets are a really tempting idea. The JVM is a serious piece of machinery; despite its startup time, once it gets rolling, Java can go fast indeed. And although it hasn't been very widely used, applets and javascript have had a means of calling back and forth for quite a while. It's called LiveConnect. (although LiveConnect may be replaced at some point...)

So, a beefy, fast VM that runs inside the browser, coupled with a nice web UI, and no manual installs for your users -- big win? Maybe! On considering this, one of my first thoughts was to put Jython into an applet and build something akin to JES -- a little Jython IDE in the browser! The kids would love it! So I set out to make this happen, or at least build a proof-of-concept.

I ran into a serious snag, though -- the Jython interpreter works by compiling to bytecode and bringing it up with the class loader, the latter of which applets aren't typically allowed to do. Special permission can be granted with a Java policy file, but that's a very high barrier to entry. It's certainly not the sort of thing a person might do on a public terminal. Signing the applets may also be an option -- but that's also a pain, and anecdotally, I think signed applets may not always get the same rights, cross-platform.

Once I understood the permissions issue, I did get a little Jython repl going -- it just hooks "eval" up to some javascript for output. Nothing fancy, but kind of satisfying. For now, though, it doesn't look like Jython IDEs are coming to the browser. Until we rewrite it to not use of the class loader. Or just use some more modern approach.

Some other people have had similar thoughts!
  • Interactive Jython Console: this one, for the exact same reasons that mine does, requires a java policy file.
  • the ruby-in-browser project. It does exactly pretty much what it sounds like it does, also by means of an applet and LiveConnect. Interestingly, JRuby doesn't seem to use the class loader, it just works! Check out the demo! [thanks for the heads-up, Lindsey Kuper!]

Wednesday, May 28, 2008

GWT 1.5RC1 -- we're pretty pumped.

Google Web Toolkit 1.5 RC1 is live!
http://googlewebtoolkit.blogspot.com/2008/05/google-web-toolkit-15-release-candidate.html

Get the bits while they're hot.
http://code.google.com/webtoolkit/download.html

GWT 1.5 has a whole lot of wild improvements, detailed here -- but importantly, compiler now supports Java 5 features and produces even tighter code than before, and the UI library has had some pretty serious reworking -- there's a new API for working directly with the DOM in a typesafe way, and GWT apps now come with nice styling by default. We've been working pretty hard.

Oh, and don't miss the new documentation browser :) Much better than browsing the GWT wiki docs by hand.

Share and enjoy!

Thursday, May 01, 2008

Real World Haskell

In case you haven't been exposed to enough Haskell buzz recently, there's a new book in the works: Real World Haskell by Bryan O'Sullivan, to be published by O'Reilly. In addition to the abstract functional loveliness that you expect, they're going to build webapps that talk to databases, which is not the first thing that pops into my mind when I think "Haskell".

Beta chapters are here.

(if you knew where to look, there was an announcement on LtU about being an alpha reviewer, to get at the pre-pre-release chapters...)

Wednesday, April 16, 2008

vim tip: don't mess up the indentation when you paste

vim does really lovely autoindenting. But what happens when you want to paste in pre-formatted text from somewhere else? All your code indents way off to the side, oh noes! Here's what to do.

:set paste
Okay, now paste.
:set nopaste

Perfect. I rated this tip "life-changing".

Tuesday, March 25, 2008

Penguin Parens: referenced by NLTK!

A while ago, I was working on remixing horoscopes and I gushed quite a bit about the lovely free Python NLP toolkit NLTK. And I was honored to get a comment from Steven Bird, one of the NLTK developers and an eminent natural language researcher.

It turns out that they quoted my blog post for the "Quotes" page. Aw :)

I'm checking out their circumstance again, so to speak -- and it's grown so much. They've got most of a textbook (free online!) put together. And the API is gigantic, so much code! Amazing. I'm going to have to dig back into it.

Sunday, March 23, 2008

you need more retrocomputing: Mini vMac and old Apple software

So I was looking for cool activities for my new XO laptop, and I ran across Mini vMac, an emulator for the Macintosh Plus; it's cross-platform, but if you happen to have an XO, here's the .xo activity.

And it works really well! You'll need a ROM image, y'know, taken from the Mac Plus that you personally own, and also an image from a "System" disk (available on that same page).

But! As the Mini vMac page points out, you can get old Macintosh System software from somewhere else -- Apple's own Older Software Downloads page, which features all sorts of outdated Apple software. System disks, drivers for bizarre old SCSI hardware... and Hypercard.

The Mini vMac page also links to this fantastic compendium of old macintosh software from third parties, which has even more wild old stuff, like vim 3, forgotten Lisps and MLs, games that you might remember.

I'm going to have to run this fullscreen on my Macbook Pro, woo!

Friday, March 07, 2008

more retrocomputing: procedural graphics languages!

One of my first online experiences -- and probably the first for a lot of people, came in the form of Prodigy, over dialup, on an old DOS box. At the time, Prodigy wasn't an ISP as such -- it was an insular online community, with its own exclusive content and games and message boards and email. Of course, this model wasn't sustainable forever...

But! The interesting thing about Prodigy was the graphics. It was clear, at the time, that it wasn't downloading "images" as such -- y'know, like raster graphics -- it was drawing in terms of commands that would build pictures out of shapes. This was kind of cool; the dialup link was slow enough that you could see it building the picture, usually bigger background shapes first, then the details would get filled in. It occurs to me now that it must have been somebody's job to work out how to draw pictures like this...

It turns out that Prodigy was using the standard language for this: NAPLPS, the North American Presentation Level Protocol Syntax. And there were several other services that used the same approach -- some of them sent the commands over modems, like Prodigy, but some went over TV, during that mystical vertical blanking interval that broadcast engineers talk about.

Speaking of sending non-raster graphics down the wire: did you know that there's a set of drawing commands understood by some DEC terminals? It's true. It's called ReGIS. At one point, Brett and I got our hands on some old DEC terminals and spent a few afternoons messing around with this on something that must've been a vt340 or vt420. You can find out all sorts of wonderful things about DEC terminals at the super-top-notch vt100.net. These things seem to be indestructable; I held on to the vt220 for years, passing the joy forward sometime in college; it was still chatting amiably over the serial port on my Linux box, and almost as old as I was.

That's all for now. Happy hacking!

Friday, February 29, 2008

in which alexr surprises you by linking to Microsoft software

If you build web applications, you've got to contend with Internet Explorer, versions 6 and 7. The quandary is that it's not clear how to have both versions installed at the same time. Some people have resorted to using several virtual machines, but this turns out to be overkill.

Lindsey Kuper comes to our rescue, and points out this article: Using IE6 and IE7 together.

The first approach in that article was especially helpful for me: there's a version of IE6 bundled with all the libraries, ready to go even if the updates have installed IE7 for you. It works great, in as far as IE6 can be said to "work great".

What is surprisingly nice is the Internet Explorer Developer Toolbar. This is from those cats in Redmond, and I'm surprised it's not more widely publicized. It comes with a very functional DOM inspector that users of Firebug will figure out pretty quickly, and it works with that standalone IE6.

So! Now you can more easily and precisely diagnose just which inane and hateful things the IE rendering engine is doing to your app!

Happy hacking, and good luck :)

Tuesday, February 26, 2008

Today's viewing: Linus on git.

Not too many months ago, Linus gave a talk at Google about git, the distributed version control system he built for use in managing the Linux kernel. He spoke, in typical non-diplomatic Linus style, about why everybody should switch to git and why CVS, Subversion, Perforce, and non-distributed source control in general is broken.

"There's a few of them in the room, I suspect -- you're stupid!"
-- Linus calls out the svn developers on hand

The Linux kernel team had previously been using the proprietary BitKeeper, which a number of kernel developers (including Alan Cox, with his Mighty Unix Beard) chose not to use. Before that, source control for the kernel was handled by sending patches around -- preferable to using centralized source control, according to Linus.

So let it be known that I haven't actually /used/ git yet, or in fact any distributed source control system -- but I like the idea so far. In this post, I'll outline the reasons Linus lays out for why we should be using git.

First off, the distributed nature of git also has performance and convenience benefits, from the simple fact that it doesn't need to make network roundtrips. Having your own repository on the local machine means that you're always set to go, even if the network is slow or unavailable.

Secondly, traditional source control makes branching and merging branches hard (Linus points out that many real-live software developers have never done it), so you don't want to commit your work until it's ready to be seen by others -- this means that you can't do much version control during the actual code-writing process. Contrastingly, in distributed source control, every checkout acts as its own branch, so each developer has at least one. This has the benefit that you can commit to your own repository whenever you want -- and then back up to previous versions whenever you like.

Centralized approaches to source control are particular drag when people are depending on your committed code as correct, or at least non-breaking -- you're not allowed to commit until your changes are ready for the rest of the team. As a project gets bigger, the associated test suite (hopefully) does as well, eventually leading to a pretty involved testing procedure that developers are supposed to complete before committing. And as Linus puts it, "... people make one-liner changes and ignore the test suite, because they know that those one-liners can't /possibly/, /possibly/ break." And then these breaking changes get pushed out to the rest of the team. The alternative approach, with git, is that you commit to your own version of the repository whenever you want, and when you're ready, your teammates can pull from you -- but in the mean time, you get all the benefits of having your own branch(es).

Thirdly -- social benefits! In his development process, Linus only pulls code from ten or fifteen other developers, whom he trusts to have filtered "up" appropriate changes from the people that they interact with. As he puts it, "if you have determined that somebody else is smarter than you -- go for it!" So the Linux kernel source control process mirrors the social networks of trust that make the whole thing happen. In the end, a lot of people end up only looking at the Linus branch, the defacto "official" one.

This dodges the social issue of giving out commit access to the central repository. Traditionally when managing a project, you create this class of people who are "ostensibly not morons", and typically, you make that group of people too small. Distributed model makes this go away, because everybody has his own branch. And if you happen to do good work in your own branch, then people start pulling from you! "That alone means that every single open-source project should use nothing but a distributed model."

So. That's what Linus said. Will everyone switch to git or some other distributed source control? We'll find out. Adoption would certainly be helped out if major project-hosting sites like Sourceforge or Google Code added git support. But sort of the point of the distributed model is that there's no one central hosting point...

Maybe I'll try it out. I'll let you know.

Monday, January 07, 2008

today's reading: Neighbourhood Components Analysis

Goldberger, Roweis, Hinton, and Salakhutdinov: Neighbourhood Components Analysis. The punchline: ... learning a linear transformation of the input space such that in the transformed space, KNN performs well.

I've been making an effort to read more academic papers. This one came up for one of the many reading groups at the Goog, and I picked it out of my stack a few days ago. Here's the abstract.
In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional lin- ear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction.
This is a very common, general problem for machine learning -- you want to build a classifier, and you think you might like to use K Nearest Neighbors; it's dead simple, and a lot of the time, it gets the job done. No complicated models to train up -- if you want to classify an instance (ooh, here's an animal -- is it a kitty?), you just find the k most similar instances in your bank of examples and let them vote it out. Four of the five most-similar things in your example set are kitties? Alright, we'll call the new one a kitty. You have to tune k as a parameter to work well with your dataset, of course, and you can get slightly more sophisticated by introducing weighted voting -- things that are less similar to the instance you're trying to classify are considered less important.

The remaining question, though -- how do you decide what counts as "similar"? Typically, you use some sort of distance metric (Euclidean or Manhattan, for example) -- plot all of your instances in some high-dimensional space and see what's close. What if some of your features are noisy or irrelevant, though? Well, you could do some feature selection and prune those features out. Worse! What if you have several informative features, but they happen to be on very different scales, such that distances in one overwhelm distances in another? Well, you could manually scale them until you get some good results...

This is starting to sound like maybe K-Nearest Neighbors isn't all that easy to use, out of the box.

This paper actually solves both of those cases. Goldberger et al harness the power of linear algebra and come up with a way to learn a projection -- just a matrix -- from the original feature space, where distance metrics might not be very useful for KNN, into a new feature space, where your distance metric does the right thing. In effect, this is the feature selection and the scaling, all in one go, and they do it in such a way as to minimize the Leave-One-Out error, all tuned for your particular data set.

That's kind of cool.

As an added bonus, if you restrict the transform to project into three or fewer dimensions (just make sure the matrix is the right shape, and you're good!), then the same algorithm produces what sounds like a very nice visualization of your data. Features that are informative in discriminating your classes will be stretched out, and instances in the same class will tend to clump -- otherwise, KNN wouldn't work very well.

Lovely paper, guys! Very well-written, clearly explained, and addresses a problem probably a lot of people have had. The only issue I can raise so far -- it's not clear how long it takes to learn the transformation. This might be a very slow process, and all we're given in the paper is something to the effect of "oh, just optimize this function and you're good..." ... the machine learning pros among us can probably just look at the formula and knock it out in a few lines of Matlab (or supposedly Lisp, if you're Charles Isbell), but I would've liked a little more discussion on the implementation side... maybe they posted some code somewhere.

Happy hacking :) alexr out!

Saturday, January 05, 2008

Design Patterns Ahoy: Violator Pattern

So, at work, every so often somebody brings up the "Violator" Pattern, typically with a slight smirk. Up until recently, I wasn't at all well-versed in design patterns, but this seems like a field that one, as a professional programmer, should be at least familiar with. So I picked up the classic Gang of Four book and started reading through it.

And while, so far, it seems like just good, solid advice about software design -- and at the risk of sounding uncritical, is it really so bad to have common names for commonly-occurring software structures? -- ... no "Violator" pattern is to be found in Gang of Four. Or on the C2 design patterns wiki.

And I wanted to work out for myself what this mysterious Violator Pattern could be. And I figured it out today. In GWT, you can use the JavaScript Native Interface to make calls from code written in JavaScript into methods written in Java. And when you do this, the compiler totally ignores the access modifiers on the methods you're calling. That's the Violator pattern.

A paragon of OO design, verily :)

Saturday, October 06, 2007

ye can't get ye flask

So I tried Second Life again, despite my difficulties getting going with it the first time. Surely they've got stuff ironed out now, right?

And it's pretty easy to sign up, and you get a fairly nice-looking avatar by default (mine was "city chic", which pretty much describes me in First Life as well), and you download the client, which they probably have one for your platform, and it all works. And you find yourself on this friendly-looking island and the tutorial tells you how to walk around and say things and stuff. And there are a few other virtual people standing around.

But it's not clear if they can hear you when you speak -- and if they're trying to chat with you, how would you find out? Maybe there's a "chat" window that you can pull up. I found a window that maybe wanted to be that one ("History"?) ... but only some of my utterances seemed to show up there.

So I walked around on that little island for a few minutes, trying to figure out how to get to another island -- and there's this "teleport" button, but how do I use it, and why is it grayed out?

... after a while, I found that I'd hit something that turned off my walking. My arrow keys would turn me in place, but I couldn't walk around anymore. And there's no clear "oh, you're in 'don't walk anymore' mode" indicator. Buh?

And that was enough to end my second foray into Second Life. There are only so many minutes in the day.

I think my experience was rather more anticlimactic than Drew's. He at least found out how to go places. But neither of us could figure out how to hit people.

Tuesday, September 11, 2007

possible upcoming projects

I need a cool side project. I have a bunch of ideas that I think would be interesting: here they are. I've already started futzing around with a few of these... comments or suggestions are definitely welcome.

New Things To Build
-
Sketch out and build an online community where kids can make stuff in code and share it with each other, like MOOSE Crossing only awesome and on the web and easy to use and linkable. Does this want to be made of Scratch, or similar to it? Can we include proper inheritance? Make it work for the OLPC.

- TEB. Make TEB awesome. Does TEB want to use/be part of/merge with gnoetry? Related: That horoscope remixer thing that never worked right.

- Some automated way to calculate Erdos numbers, probably with the help of DBLP.

- A Scrabble bot. I've been thinking about this a lot, actually, looking at different algorithms for permuting strings -- but I think I can do Scrabble without looking at permutations... maybe all you really care about is whether certain groups of letters constitute the same bag. If it turns out we don't have to permute things (in n!), then we can probably plan several turns ahead. This might be similar to playing Backgammon,, but maybe we can't do something like that kind of policy learning; your set of possible actions changes so much every turn, and you'd have to use probability estimates to look into future turns...

Interesting Exercises, way already done by other people
-
Build a language and virtual machine. Educational for me, not very useful for other people. Recommended by Strick. Building a VM at a pretty high level of abstraction might not be that hard, just think about stack frames and returning things. At what level do things like the Python VM work? The JVM? Surely there are different approaches used for this; what are they? Would it be hard to build something with multithreading in mind from the bottom up? What about a purely functional lambda-calculus implementation?

- Write a checkers bot. Tree search, AB pruning, etc. are pretty well understood, even by me -- the hard thing would be an evaluation function.

- Build a MUD, or at least a MUD framework. But in Scheme. ynniv and I somehow never got around to that...

- Another Sudoku solver. But in Haskell.

Thursday, August 23, 2007

James Gosling, it turns out, thought harder about type inference than I did

So for a while, I'd been wondering about the type system in Java, thinking that perhaps I'd found a shortcoming -- if you have a more general type (say, "Thing"), and a more specific type (say, "Chair", which extends "Thing") -- then why isn't List<Chair> a subtype of List<Thing>, meaning that you could pass a List of Chairs into a method that takes a List of Things?

The mighty Toby R, in a conversation with me and Strick, shed some light on the situation and led me to understand why this is not the case. The particular use-case is: when you're in the method that takes List<Thing>, you can add Things to that list. And the Things you add could well be Basketballs. So when the method returns, the caller is still expecting to have a list of Chairs, not realizing that there are now Basketballs in the list... so Java cleverly disallows this case.

(also, somehow I missed an anonymous comment, probably from my mother, well-respected for her work on type theory, which explained that exact case)

Now in a purely functional language, where you don't go around getting references to objects and modifying them, I'd like to posit that this wouldn't be a problem -- but perhaps there are other problematic situations? ...

Wednesday, August 22, 2007

typing about typing about types

As we make the push to handle Java 1.5 features in GWT (it's about time!) I'm having flashbacks to undergraduate Compilers. Perhaps this is unsurprising, considering what I'm working on. Part of the difficulty in the problem is that whether you're talking about types in the code that constitutes the compiler itself, types in the code that the user is going to put in, or types in the output code... we use the same word. Oh my. (much worse, I suppose, is the code you get with the Appel book, where half of the words on the screen are helpfully "ty" or "typ". Jebus.)

But! All of this means that by the next GWT release, you'll likely have some code that I touched in your hands. Woo.

Sunday, August 12, 2007

this blog post: for you, $50.

I recently met a fellow who's working on a doctorate in literature, but his previous background is in Library ScienceW. I wasn't sure what the interesting problems in library science might be, so I wandered over to the wikipedia article and started falling through the links.

A few links out, I ran into the Serials Crisis article. Apparently (and Wikipedia articles close to the "Library Science" one are never wrong), the costs of subscribing to scholarly journals keep on going up -- libraries only have so much money for subscriptions, but there are ever-more academics and subfields, thus more journals. And if a given library cancels its subscription from a particular journal, that publisher's fixed costs are still fixed, so prices increase for the remaining subscribers.

The traditional academic journal system had seemed pretty shaky, especially in light of the Web; upsetting publisher websites (Springer, ACM Portal, IEEE's site...) seem like their sole purpose is to keep the enterprising students from reading an article. In light of how most of the science behind the articles is publicly funded in the first place, the articles seem like they should be public as well.

I wouldn't mind seeing companies like Springer just going away; universities seem totally capable of hosting journals -- over the web especially! There may be some compelling reason for the current system, and I'll try to find it out... but for the short term, tools like Google Scholar could go a little further out of their way to help us find the full text of an article!

Also:
http://en.wikipedia.org/wiki/Open_access
http://en.wikipedia.org/wiki/Open_access_journal
http://en.wikipedia.org/wiki/Open_access_publishing
The Serials Crisis: A White Paper for the UNC-Chapel Hill Scholarly Communications Convocation
The Crisis in Scholarly Publishing

Friday, August 10, 2007

You get spoiled by languages with first-class functions

Earlier today, I needed to take a list of strings and grab all of them that end with ".html". Very simple task. My first thought is of course something like:

[fn for fn in fns if fn.endswith(".html")]

Or even:
filter( lambda x: x.endsWith(".html"), fns)

Or the equivalent Lisp. Y'know, with "loop" and "collect". (Common Lisp has, as they say, the Cadillac of loop syntax)

Or something to that extent. But then I remembered that I was writing in Java, and I ended up writing a for-loop. There's no clean, idiomatic way to say that in Java, is there? Do you have to build up the list procedurally? ...

Sunday, August 05, 2007

"If I knew the answer ahead of time, I wouldn't be writing this program!"

On the plane back from San Francisco to Atlanta, I read Thomas and Hunt's Pragmatic Unit Testing. It's a very quick read, about 120 pages plus appendices. And while it's very light, it has what seems like a lot of helpful advice for good design and development practice. To be honest, I've never done much unit testing in the past, thinking "oh, that's what those corporate software engineer guys do -- pssh." But! It turns out that these days, I am a corporate software engineer, and anyway one should always be looking out for new ways to hack more effectively.

There's a bunch of important principles to take away from Pragmatic Unit Testing. The one that stuck with me the most, though, is that when designing and writing code, you've got to think: "how am I going to test this?". The significance of this is not just "oh man, I'm going to have to write a unit test for my method"; it gets back to that generally-understood but oft-ignored idea that every logical unit of your code really wants to be its own method, a modular thing that you can use separately -- because you're going to have to use that same calculation again somewhere else. And Don't Repeat Yourself.

For example: if you have a method that calculates how to do something and then does it (say with a call to another library), maybe you want to make that two methods: the calculation and then a call to that calculation coupled with the library call. This will be easier to test -- you're not really interested in testing a third-party library, just your own calculations -- and as a happy side-effect, your code is now cleaner and more reusable!

Similarly, separating out the backend code from the GUI (two of the things that get my hackles up the most in this life are terms "business logic" and "MVC") lets you properly test the Stuff That Does Stuff on its own. One example in the book hit particularly close to home -- a small GUI application where all the caculations and I/O happened mixed in with the Swing code.

That example, and in fact the whole book, brought on flashbacks to a project I'd worked on recently. One of my friends and I (and he's one of the sharpest guys I know) inherited a fairly involved program and ended up sinking months trying to fix it up. This system suffered from pretty much every pitfall in the book: random silently-caught exceptions, real work happening mixed in with the GUI code, needlessly long, opaque, nigh-untestable (let alone "tested") methods, repetition all over the place. Worse! It was built by a guy who'd supposedly specialized in software engineering -- and he did pretty much everything that Thomaas and Hunt warn against! Not a pleasant situation. I'm sure you can relate.

Of course, I knew at the time that this was atrocious code. But now maybe I'll be a bit more principled in my development, working with a lean towards easy testability. Unit testing will probably be a good discipline to get into.

On the other hand, I've been reading about (and writing a bit of) Haskell. All this murky business of setting up and tearing down state, "proving" to yourself that each function does what you think it does "for the boundary cases" -- it all relies on the idea that you're going to be able to predict where you're going to make the bugs (by heuristic, habits, and mnemonics). And if you're smart enough to predict where the bugs are going to pop up, it seems like there's something better you could do to keep them from being introduced at all. In ML for the Working Programmer, L.C. Paulson suggests that a mark of the professional in the future will be writing functionally (in ML). The modularity practices seem like the Right Thing, useful for the functional programmer as well as the OO, but if your code could be formally verified, how much more confident would you be that it was correct for the general case? Simple testing is nothing like a proper proof.

But who ever writes proof-carrying code?

Saturday, August 04, 2007

further readings: one day, I'll have something coherent to say about type systems

The first finder of any error in my books receives $2.56; significant suggestions are also worth $0.32 each. If you are really a careful reader, you may be able to recoup more than the cost of the books this way.

However, people who have read the book Eats, Shoots & Leaves should not expect a reward for criticizing the ways in which I use commas. Punctuation is extremely important to me, but I insist on doing it my own way.

-- Don "The Lion" Knuth, The Art of Computer Programming homepage.

*laughs* We love you, Don.

Also: there's so much to read in this life. I've recently picked up books on proper C++ technique, security, and proper unit testing, and I'll probably give those some priority on my ever-growing Queue. Of course, there's all those books on statistical NLP that need to get read in the near future if I'm going to be of any help to anyone.

I've been in on a few conversations recently about C++ and pitfalls and bugs that can arise when using it. The more I think about these, particularly random language-specific casting rules and several different competing ways to represent strings, the more I think that Haskell is going to be a good idea. Or possibly SML.

Thursday, July 26, 2007

(ping)

It's been a while since I've last posted: I should make an effort to have exciting technical things to say more often.

But! In the last few months, I've graduated, got a paper accepted to a conference, taught six weeks of summer camps to enthusiastic middle- and highschool kids, and made all the preparations to start up work with the Goog. My first day is Monday.

Topics that I've been looking into and will hopefully post about soon:
- Haskell -- maybe one day I'll have a more consistent opinion about how I feel about static vs. dynamic typing. Writing here on the issue will probably help sort it out. Or just building something big with Haskell.
- Statistics. I went out and bought All of Statistics and I've been thumbing through it a bit.
- CS Education and edutech. Good goodness, education. I've spent most of my summer with the childrens, trying out different ways of convincing them they want to know what I think they should know. It's been going well, for the most part.
- GWT and associated topics in Javascript and Ajax. I'm joining the GWT team in just a few days, so I'm working pretty hard on learning it!

Friday, May 25, 2007

more exciting edutech!

Scratch, from the Lifelong Kindergarten group at MIT, is this lovely environment for kids (or other novice programmers) where you can make funky animations and play sounds and do cool effects! The cool effects are a major selling point -- you can get funny animations right off the bat. Everybody loves a fisheye effect. And it has this lovely website, where kids can share and tag their projects! Web 2.0 ahoy!

It's really easy to figure out, especially if you've seen things like the LEGO mindstorms interface, or Alice -- commands snap together with a familiar building-blocks metaphor. This is to say that there's a fairly standard vocabulary for childrens programming environments, these days...

The Scratch intro "Facilitorial" video is here.

Also fairly interesting (and brand new on my radar as of today), is Greenfoot, which is another educational programming environment, perhaps for slightly older kids. It makes it easy to do simulations with different kinds of "actors" on these nice 2D worlds (they can be grid-worlds, but they don't have to be)... although you have to write some Java, it looks like, to build up your new behaviors. Maybe this is awesome too.

Wednesday, April 25, 2007

you'll probably tell me that Emacs already does this

In the future, my text editor will have an option to make it think of camelCaps and underscores_in_identifiers as word boundaries.

How useful would that be for you?

Monday, April 23, 2007

retrocomputing from the other side of the pond

Here's what I found out on my recent voyage through the wikipedias! Semi-vicarious nostalgia ahoy!

In the early 1980s, the BBC started an initiative called the BBC Computer Literacy Project, a major part of which was the production of the BBC Micro, a machine produced by Acorn Computers, complete with its own line of peripherals including expandable memory and various pluggable co-processors. There was an associated television show, The Computer Programme, which ran in various incarnations through the decade and featured music from Kraftwerk. The computers came with BBC BASIC, a rather more advanced system than the BASICs that were shipping stateside -- it had proper named subroutines and if/then/else, features most users on the MS-DOS side of things wouldn't see until QBASIC.

The mind-blowing part of the project was Telesoftware, whereby computer programs were sent embedded in the broadcast television signal, using Teletext, which is how the closed-captioning data was sent in Britain at the time. Analogue technology like broadcast TV feels so alien these days... but the Beeb was busy using it to send example programs to eager learners at home.

There seems to be a pretty active online community of BBC/Acorn enthusiasts out there, two and a half decades later.

You know how to use the Googles, of course, but here's another, more detailed overview of the BBC/Acorn system.

Wednesday, April 18, 2007

java 6: apparently even less of a loss than java 5!

Java 6. It's so hot right now. Java 6.

Well, I'm excited anyway. The new Scripting API provides a standard interface for embedding other languages in Java and making calls between the two. See if there's already a project to handle your favorite language here -- there probably is, unless you like Common Lisp. There's even a mechanism for manipulating namespaces in the embedded language, pretty snappy.

Apparently recent releases of Jython already have hooks to support the new API. Maybe the next version of JES should be rewritten with that in mind, say once Jython 2.2 is stable. And perhaps we'll see ABCL ported to the new standard...

Also in Java 6, the built-in support for splash screens is kinda cute. And they're saying that the whole shebang is faster and prettier. Good job, guys!

Tuesday, April 17, 2007

tools for blogging and reading

I had an idea for a tool today, one to keep track of links that you want to blog about, assuming you keep a buffer of a few links hanging around like I do. Usually, I have a bookmark folder set aside for the next batch of links, but it might be nice to have a special command that would let you right-click on a link and save it to somewhere. Later on, you'll be able to paste back your links (maybe with HTML link code) into arbitrary text boxes using another context-menu command. This probably wants to be a Firefox extension.

Graham suggests that this would be better with online storage -- it could sync up with your del.icio.us bookmarks and keep track of what you've already blogged about. Perhaps someday soon I'll be cool enough to use del.icio.us.

Speaking of reading things on the web and managing one's reading -- please allow me to direct your attention to BibDesk and Skim, a pair of apps for the Mac designed with your reading pleasure in mind. The first is a bibliography manager that works with BibTeX format and has a lovely UI and lets you drag references around and whatnot, and the latter is for reading, highlighting, and annotating your papers, which is traditionally pretty difficult with a PDF.

The downside of these is that they're Cocoa apps and Mac-only, but they're pretty much what I'll want to build when I get around to putting together that cross-platform Python paper manager thing I've been thinking about...

Thursday, April 12, 2007

things that start with p

Programmable completion for bash (mentioned earlier) is still a pretty exciting idea. I came up against the first situation where I felt like it needed to be extended today, though.

I've taken to using jar to deal with .zip files, so I can use consistent tar syntax and don't have to remember how to use the zip options. But! The bash_completion file for Ubuntu doesn't include ".zip" as an extension that it looks for when tab-completing files for jar, oh noes!

Easy enough to fix, right? I pop open /etc/bash_completion and start searching for "jar". There's a section near the second occurrence that looks like:
         _filedir '?(e|j|w)ar'
After some fiddling, I change that one line to " _filedir '?(ear|jar|war|zip)' ". And it works! For reference, that the _filedir function looks like this. It's painfully obvious to everyone what this does, yes?

To be totally fair, there's some explanatory comments right above it... but it's obtuse things like this that make we want to switch to a shell with a more sensible scripting language. bash is often line noise. We claim that allowing users to modify their environments to fit their needs is one of the major benefits of Free Software, but are we doing enough to encourage that? How is your mom supposed to pick up bash script? It seems like scsh isn't meant for interactive use as your daily shell, but what if your everyday environment had a more modern language embedded in it? Are things like that already out there?

Also: speaking of Scheme embedded in things, JScheme is a dialect of Scheme with a very simple interface to Java, called the Javadot notation . It's by Peter Norvig and crew, fairly recently updated and feature-complete. Also on Peter's (fantastic) site, you can find his older "mercilessly small, easily modifiable version". I very badly to embed this in JES. Media Computation in Scheme ahoy.

Monday, April 09, 2007

Alice wants him. Bob fears him. Charlie wants to be him.

Bruce Schneier not only has a Bruce Schneier Facts page devoted to him -- he's aware of it and has a favorite Bruce Schneier Fact.

My favorite so far: "Bruce Schneier writes his books and essays by generating random alphanumeric text of an appropriate length and then decrypting it." -- Bruce Schneier in the comments

Saturday, March 31, 2007

list comprehensions!

List comprehensions. I've been a fan of these for a while, but I'd like to share:

tenpercent = len(lines) / 10
testset = random.sample(lines, tenpercent)
trainingset = [line for line in lines if line not in testset]


Python makes me warm and fuzzy on the inside. Also, random.sample() is pretty sexy!

Wednesday, March 28, 2007

brains 'n' balancing training data

- Martin points us to a nice article over on Developing Intelligence: 10 Important Differences Between Brains and Computers. Your computational metaphor just breaks down eventually, y'know? The brain is not very much like a Von Neumann computer. It's a lot squishier.

If building classifiers is your thing, you may be interested to take a look at these articles:

- Gustavo E. A. P. A. Batista , Ana L. C. Bazzan, and Maria Carolina Monard: Balancing Training Data for Automated Annotation of Keywords: a Case Study.
Three researchers, seven middle names, one novel technique for building balanced data sets out of unbalanced ones for training classifiers: generate new instances of your minority class by interpolating between examples actually in your dataset. I'm still trying to decide whether this approach should work for the general case -- does it make too many assumptions about the shape of the space? Particularly: can you arbitrarily draw lines (in higher-dimensional space) between positive instances? What if there are negative instances between those two? Which dimensions do you look at first, and how is this better than just adding some noise or weighting positive examples higher? (is that last option the same as simply counting them several times?)

- Foster Provost: Machine Learning from Imbalanced Data Sets 101.
A basic overview of the problem, examining the motivation for building classifiers at all and some different approaches to sampling. The award for Best Name Ever goes to Dr. Foster Provost.

Friday, March 23, 2007

reading feeds over the web

It seems like all the cool kids are using Google Reader these days, and I must admit, I'm impressed. The interface is so clean, and a web-based feed reader where you can aggregate all of your habitual reading seems like the right thing. And the ability to share items from your feed with your friends without making link-only blog posts (or forwarding emails around) is pretty compelling. On the other hand, the point of so much of the web thus far has been linking to other parts of the web: it has a mysterious self-referential nature... does this sort of sharing diminish that aspect? Does a meta-feed like this put you, as the independent media maven that you are, on a different level than the well-established weblogs? When you stop putting links in your blog and publish a Google Reader feed, are you more like BoingBoing or metafilter, or less?

It is the future. We've got dynabooks and memexes, and we use them to distribute pictures of cats doing cute things.

O my vast readership, I address to you this question: how do you read news online? Do you have some separate feed reader program? Do you use your browser's RSS features? Google Reader? Your LiveJournal friends page? Something else?

And moreover: for the LiveJournal denziens, does anyone know of a good method for reading "friends-only" posts through Google Reader? There are a few posts out in the world on this topic, but nobody seems to have a decisive answer yet... perhaps we can answer the question definitively.