Thursday, February 24, 2011

Scripts, Plans, Goals and Understanding

So, years ago, Kurt Eiselt, who was then a professor at Georgia Tech, did an independent study with me on NLP/NLU. It was pretty great, though mostly because it got me excited about the field. We wrote really simple parsers for really tiny subsets of English, in Common Lisp, and talked about how language might work cognitively. He had me get a copy of Schank and Abelson's classic Scripts, Plans, Goals and Understanding. I read some of it, but for the most part, it's just sat on my shelf for the better part of a decade.

Out of a sense of "geez, this is something I should have read, I'm an NLP researcher now", I've been working through it. It's kind of slow going: I find Schank kind of light on the details and heavy on the intuition, fairly vague. Maybe the later chapters have more detail about the story understanding system they purportedly built in the 70s.

But if there's been a chunk that's worth reading, it's this, at the very end of chapter 5.
John couldn't get a taxi, so he rode his horse downtown and into a restaurant. He beat up another customer and took a menu from him. He decided to have a steak. The waiter came along and John offered him a bottle of scotch if he listened to John tell him what he wanted to eat. John went to the kitchen and told the cook to give him a steak, because the cook could always deduct the gift from his income tax. When the cook refused, John offered to give him guitar lessons, and that worked. While John was eating the steak, the waiter came back and stole $10 from John's wallet. Then John got on his horse and rode out.

Tuesday, February 22, 2011

Python tip: dir() with no arguments

If you do Python, you probably knew that you can ask an object what it has on it by saying dir(theobject).

I just found this out, though: you can also say dir() with no arguments to find out everything that's in your current namespace. No more losing temporary variables that you forgot about earlier in the repl. It works for finding out which modules you have loaded, too. Holy cow.

>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
>>> x = "this string is amazing"
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'x']

Saturday, February 05, 2011

Machine of Death Remix Corpus!

Machine of Death is fantastic. And it turns out that most of the stories in Machine of Death allow derivative works...

And when I hear "derivative works", I think the word "remix". But what's a remix of a text? Cut-up poetry? n-gram models to generate ominous-sounding emails to your family? Counting how many times the word "of" appears? The original text, but with randomized capitalization? This is for you to decide. Well, it is now. Here's all the derivs-allowed stories, conveniently pulled out and converted to ASCII.

Machine of Death Remix Corpus.

Use this to train your poetry bot, or your story understanding system, or your unexpected other thing! Share and enjoy :)

Wednesday, February 02, 2011

empiricism, faith, computational linguistics

Mike sent me a fantastic piece by Ted Pederson, calling for NLP/CL researchers to care more about having reproducible results and maintainable software.

Empiricism Is Not a Matter of Faith.

It's sad that this is a problem; it should be easy to get other researchers' software up and running, reproduce the results reported in papers, and plug things into other things -- but I think we're moving in that direction. At least one CL conference, CICLING, explicitly calls for open software and reproducible results. Which is pretty cool.