Wednesday, December 30, 2009

here come some explanatory examples

Having recovered from all that fruitcake and holiday cheer, I've started digging through the code I wrote over the past semester, looking for things that could be straightforward examples.

So far, I've got:
  • calculating the entropy of a discrete random variable
  • a cute implementation of finite-state automata with matrix multiplication
  • calculating the probability of a Markov process going to a particular state, again with matrix multiplication
  • a simple CYK-style chart parser for probabilistic grammars (computing inside probabilities, outside probabilities, and the most probable parse)
  • a parse evaluator that gives precision and recall for parse trees
  • probabilistic part-of-speech taggers that take into account bigrams, both by trying all combinations of tags for the words and using the Viterbi algorithm
  • Some pretty clean code for hidden Markov models in general

I've already checked these in over on narorumo. They're all in Python, but some depend on nltk or numpy.

They'll be increasingly clean and documented over the next week or so. I hope these are helpful to somebody!

Monday, November 16, 2009

explanatory power of working examples

The NLP algorithms I've been studying since I started back at school aren't particularly complex. But they're often described with really dense notation: maybe your field does this too! Here's a description, for example, of how to calculate an "outside probability" -- it's the (joint) probability that a particular nonterminal symbol covers a certain chunk of text, and the words outside the span of that nonterminal. This is from Fei Xia's lecture slides (and I think these are pretty good).



Maybe what I need is more practice picking apart dense notation, but in all honesty I have trouble keeping track of what the different letters mean. Maybe a nice dynamic programming implementation springs to mind for people smarter than me, but I have to stare at it (and the surrounding slides) for quite a while!

I think I'd be making a pretty good contribution to the world if I took the algorithms I'm learning and wrote down the most straightforward pseudocode and prose versions I can, with a running Python implementation and descriptive variable names. Surely many people out there would find code easier to digest!

Somebody's already done precisely this with the Viterbi Algorithm wikipedia page, and I'm very grateful to that somebody.

Wednesday, November 04, 2009

Lenovo: you have to buy Windows, As Per Policy

I got a pretty quick response from the Lenovo sales people -- complete with verbiage at the bottom emphasizing how the email was confidential and legally privileged, and any retransmission, dissemination, or other public use is strictly prohibited. They should have put the EULA for the email at the top, before I scrolled down! I might not have agreed to read the email! Geez, or worse, what if somebody accidentally read it over my shoulder in a coffee shop!

Anyway, they said:
We do not have option to sell any unit without operating system as per policy.

So I guess I won't buy a ThinkPad. I'm just not willing to pay The Microsoft Tax when I'm not going to use Windows.

Python generator expressions

I just found out about this: Python has a really concise way to make new generators.

It looks like a list comprehension, just without the brackets. Before I knew about this feature, the code I was reading looked pretty mysterious.

There are some nice examples of cases where you might want to use this sort of thing in the relevant PEP. Especially pleasant uses from the PEP include passing a generator to the dictionary constructor, like so:
d = dict( (k, func(k)) for k in keylist)

... and, useful for me personally, getting the set of words in a file, all in one go:
s = set(word for line in f for word in line.split())

Good to know!

Sunday, October 25, 2009

trying to buy a ThinkPad without paying for Windows

I got that Dell Mini 12 some time ago, and honestly it's been a pain. It's a good-looking machine, and the keyboard and screen are nice, but the Poulsbo chipset just has terrible Linux support. Like every third time I get an update, something breaks, and I haven't been able to make it suspend/resume reliably in months -- oh, and X just broke again. What I really want is a little ThinkPad.

So I just sent this email to Lenovo:
Hey, good evening,

I'd love to purchase a ThinkPad X200. I haven't found the option on your website, though, for how I can buy one without Windows? Could you point me to that link?

I'm simply not going to use Windows; I would install Linux on it as soon as I get the machine anyway, and I don't want to pay for software I won't use.

So if you can sell me a ThinkPad with no Windows, that would be fantastic, and I'll be really happy and gladly give you money and say nice things about your company.

Thanks very much!

--
-- Alex Rudnick
We'll see what they say! I might just buy the laptop anyway, not agree to the Windows EULA, and then go through the hullabaloo to refund it.

Sunday, October 18, 2009

constraint programming in Python

You may be familiar with constraint programming, an approach where, instead of describing how to solve a problem, you describe what a possible solution looks like, and let a generalized solver find possible solutions. This is the sort of thing you might do with Prolog, Oz, or any number of libraries for your favorite programming language.

If your favorite programming language is Python, there are at least two different libraries for this approach! Unfortunately, they're both called "python-constraint"; this led to some confusion on my part. Here they are:

logilab-constraint. This is packaged in Debian/Ubuntu as "python-constraint". It's put out by the French company LogiLab, who contribute a bunch of Free Software useful for doing AI-flavored things. Their HMM library is pretty slick too.

python-constraint is a package by Gustavo Niemeyer, and it's got this really nice tutorial.

I mention these because my new research group is using this latter one to build a dependency grammar system based on Ralph Debusmann's XDG.

And more about that, as we get to it :)

Monday, October 12, 2009

normal distributions and R

When I'm using R to do statistical things (such as homework), I feel somewhat torn -- it's got so many nice functions that come built in, but the language itself is slightly clunky, and integrating code that I've written in R with bigger projects seems like it would be kind of a pain. That's a general problem with picking any special-purpose language, though -- I might make similar complaints about Matlab/Octave or even Prolog...

I note, though, that I haven't jumped ship to NumPy yet.

pnorm and qnorm


I just wanted to mention these fantastically easy-to-use functions that come built right in: pnorm and qnorm.

pnorm is what you use if you have a z-score and you want the probability that a value in the distribution would come up as less than that score. This is equivalent to looking up probability values in the "z" tables in the back of your stats book. pnorm(0) gives you 0.5, since half of all values are going to have a value less than 0.

qnorm does the inverse -- you give it a probability and it gives you back the z-score below which that much of the probability mass lies. So if you give it 0.5, it gives you back 0.

Both of these functions can take more parameters -- you can specify your distribution mean and stddev (so you don't have to use z-scores), for example. Type "?qnorm" for the docs!

Tuesday, August 18, 2009

ICFP programming contest 2009!

Not too long ago, Lindsey Kuper and I stepped into the ring once again to compete in the ICFP Programming Contest!

She's writing up the full story over on Geek Buffet.

The quick recap, however! The problem had to deal with orbital mechanics -- we were to control a simulated satellite as it orbits a simulated earth. First, you just had to transfer orbits, then meet other satellites, and it got increasingly complex from there.

Thankfully, the physics simulator for each kind of problem was included. All you had to do to use it was implement the contest-specific VM! (Easy, right?) Thankfully, the specification for the VM was super-clear, and we got it working surprisingly quickly.

By the end of the contest, we could handle the first two problem types, and that was competitive enough for 120th place worldwide, by the morning when they turned off the leaderboard. Good show, us!

For the full scoop, go read Lindsey's account.

Here's the code! We used Scheme and Python, and R for the visualizations.

Tuesday, August 11, 2009

change of scenery and reviewing conference submissions

So you may not have heard yet, but last month, I left Google Atlanta, packed up my cats and my computers, and headed to Indiana University. Lindsey Kuper helped quite a bit, both in motivation to do this and in the actual moving process. So I'm a PhD student now, very exciting! I'll be working on natural language processing and machine learning -- something I'd wanted to focus on for quite a while.

But in my last two weeks at the Goog, I had this really interesting opportunity, presented by my colleague Katharina Probst. She's a reviewer for the Conference on Information and Knowledge Management (is this the same as being on the program committee?), and asked if I wanted to help. It seemed like good practice for my upcoming stint as an academic.

So we (I, with her guidance and sanity-checking) had this pile of eight papers to get through. Some were fairly mundane, like learning classifiers to determine if a message is a flame, or summarizing a group of documents; one was particularly targeted at finding references to rabbinic literature in other rabbinic literature (they don't use ACM-style citations, typically). And so on. A few were written very clearly and had well-motivated discussions on why the problem is interesting and important, and others... not so much. This is why we have peer review.

But it was an interesting experience, and I'm glad I took it. My tech lead, Miguel, graciously let me use my "20%" time to Advance Science and just review papers for a day. I wanted to do a good job of reviewing, so I read the papers really closely, took a lot of notes, and wrote a few paragraphs in response for each one. Katharina at least seemed to think that it was good feedback, so that was reassuring. (although I would have liked more feedback on my feedback).

Once the other reviewer's ratings came in, I was fairly pleased to see that my ratings weren't far off from the other reviews. If I missed some amazing gem of wisdom, then at least it was apparently hard to find -- the papers I liked the best were accepted. I was more concerned, honestly, that I would mistake some stale old idea as a clever new one. But again, that's why we have many eyes on these things.

Alright! So now I just have to produce some stuff for other people to review. To the lab!

Saturday, August 08, 2009

goto: a utility for bash

Do you ever find yourself, while using bash, wanting to get to a particular file that you know is way down in a big tree of files -- maybe a big Java source tree, or some other nested file structure? You know you want Foo.java, but is it in src/com/foocorp/package/a/b/c, or maybe somewhere else?

Announcing goto, a bash utility which solves just that problem! Now you can just say:

you@computer:~/project$ goto Foo.java
you@computer:~/project/src/com/foocorp/whatever/path/to$ vim Foo.java


... instead of whatever business you were going to do with locate or find or your IDE, or just manually sifting for it.

Here's the README. Comments, complaints, and patches welcome! (and if you find this useful, I'd be really pleased if you'd let me know!)

Friday, August 07, 2009

More Ubuntu: getting the NetworkManager Applet back

Ubuntu comes with this really great network selection applet that sits on your gnome panel. It's called the NetworkManager applet, and it looks and acts more or less like the analogous dropdown menu on Mac OS X. Unfortunately, if you remove it from your panel, it's unintuitive how to get it back.

Maybe you accidentally removed it, and then, after some research, tried "nm-applet" from the command line, messed with the Network Monitor and the Network Connections preference page, and even tried editing nm-system-settings.conf.

What you really want to do is right click on your panel, click Add to Panel, and select Notification Area. Why the nice NetworkManager Applet and battery status applet both live in the Notification Area remains a mystery.

Like so:

Thursday, July 09, 2009

Why your Ubuntu Netbook Remix windows are maximized all the time

I was curious why my windows were all immediately getting maximized on my new kinda-netbook Dell Mini 12, running UNR (with the "Classic Desktop" mode). It's the sort of behavior you can imagine wanting -- with a small screen, maybe you do want to maximize everything? But I wanted to turn it off.

This thread has the answer!

UNR comes with a program called "Maximus", which was in my Startup Applications. To make the behavior go away, just go to System > Preferences > Startup Applications and remove Maximus from that list. Problem solved!

Seems odd that this is a process running in the background, instead of, say, a window manager setting.

Monday, June 01, 2009

back from Google I/O, got a new laptop!

I can speak for myself, at least, in saying that it felt pretty darn good to be on the GWT team for Google I/O. The Google Wave demo at the keynote got a standing ovation, and they made a pretty big deal about how they used GWT and it was a wonderful tool for their work. Pretty fantastic. A standing ovation at a tech conference. Totally electric.

Anyway! I got back, and my Dell Mini 12 with Ubuntu preinstalled had arrived! It's really pretty. The keyboard is a little more cramped than I'd expected (especially with the punctuation keys) but it's got a nice clicky feel. I'm not super-impressed with the Ubuntu version Dell shipped -- it's a specialized distribution of Hardy (last year's version) with some Dell and Yahoo-specific goodies. But the built-in camera and suspend/resume work beautifully! And no Microsoft tax! Not bad, really!

I'm probably about to install a fresh new (9.04) Ubuntu Netbook Remix on it.

Update! There's a bit of a funky driver issue with running the stock Ubuntu instead of the one Dell provides, but it's very surmountable. Here's how to install Ubuntu "Jaunty" on your Dell Mini 12.

Wednesday, May 20, 2009

Scala article on the Scala site!

I helped Lex Spoon and Toby Reyelts write an article about running Scala on App Engine.

Here's the article! As of right now, it's linked from the front page on the Scala site.

In the article, we mention my Scala/GWT sudoku solver, which is now available on Google Code. It's also live on App Engine right now!

Tuesday, April 21, 2009

Further steps: Scala/GWT/App Engine/Eclipse

I've been excited about Scala recently, for a number of reasons -- it's a modern language with lots of great features that you might expect if you've been looking at Haskell or ML. It's got pattern matching, a concise syntax, type inference, and first-order functions. The really killer thing about Scala, though, is how well it integrates with Java.

You can very easily have a mixed Scala and Java project, and make calls back and forth; Scala and Java packages sit in the same package hierarchy and compile to the same bytecode -- it all just works, pretty much seamlessly.

So, clearly, the right thing to do is hook Scala up to GWT RPC and run it on App Engine. Let's do that. This is as awesome as it's going to get, at least until the GWT compiler supports Scala and we can do our client side in Scala too.

I assume you've already installed both the Google Pluin for Eclipse and the Scala IDE for Eclipse. I'm using Eclipse 3.4.

Make a new "Web Application" project (with File > New, or just click the blue "g" icon). Click the boxes for both "Google Web Toolkit" and "App Engine" -- our client side will be in GWT, and the server on App Engine.

Right now, our new project has "Java" nature, but not the "Scala" nature. Add Scala by right-clicking the project and choosing Scala > Add Scala Nature. We're also going to make sure we have a copy of the Scala runtime library once we deploy to the server. Find out where scala-library.jar sits (expand out the "Scala Library" container in your project) and copy it into the project's war/WEB-INF/lib.

Now, down to business. There's a RemoteServiceServlet currently implemented in Java, and we're going to replace it with Scala code -- this is the thing that gets called on the server side when the client makes an RPC request. Find GreetingServiceImpl.java and delete it.

To replace it with a class implemented in Scala, right-click your "gwtscalademo.server" package and select New > Other > Scala Wizards > Scala Class. The class we deleted was a servlet and referenced by the web.xml, so give the Scala class the same name: GreetingServiceImpl.

Here's my version:
package gwtscalademo.server
import gwtscalademo.client.GreetingService
import com.google.gwt.user.server.rpc.RemoteServiceServlet

class GreetingServiceImpl
extends RemoteServiceServlet
with GreetingService {
def greetServer(input:String):String = {
return "Hello from Scala, " + input + "!"
}
}

Note that while in Java, we'd have said implements, in Scala we say with. As I understand it, this means that Java interfaces get mapped onto Scala traits. Hip!

Now you should be able to run the app locally -- right-click your project and select Run As > Web Application. Once you've learned Scala and made it do something interesting (you're on your own there), you can deploy it to App Engine!

Woo!

Wednesday, February 11, 2009

HOWTO not get "/usr/bin/env: bad interpreter"

If you get this:

bash: ./my-script.py: /usr/bin/env: bad interpreter: Permission denied

The partition your script lives on may be mounted with the "user" option set. "user" implies "noexec" (see the manpage for "mount"), which is going to keep you from running executables. And while running a binary executable from this kind of partition fails more clearly, trying to run a script with a shebang gives you this more confusing error message.

To fix! Add "exec" after your "user" flag in /etc/fstab. (again, see "man mount").

Saturday, January 17, 2009

Lisp insights from learning Logo

So I was just writing a simple recursive procedure in Logo. I wanted to go down a list, cons up a new list, and return the empty list when I get to the end.

My first pass, once I found out how to return something from a function (Logo procedures aren't expressions -- you have to explicitly return) had a line like this:

if (stacks = []) (output [])

Running the procedure gave me the helpful error message:
output didn't output to if in astep
[if (stacks = [] ) (output [] )]


Pardon?

When I remembered that Logo's "if" wants a list of instructions, it clicked with my new nugget of knowledge that literal lists are quoted in Logo. Then I understood what was going on: this "if" is neither a macro (like in Scheme, or most languages) nor lazily evaluated (like Haskell). It's a regular function, but one that expects quoted code to evaluate on demand at runtime. Rubyists and Smalltalkers: this is something like how an if block works for you, yes?

Here's an analogy into Python, for the non-Lispers out there:

def myIf(tf, truecode, falsecode):
  code = {True : truecode, False : falsecode}
  return eval(code[tf])


>>> myIf(0 == 1, "'it was ' + 'true'", "'it was not true'.upper()")
'IT WAS NOT TRUE'


>>> myIf(0 == 0, "'it was ' + 'true'", "'it was not true'.upper()")
'it was true'

Also, apparently Logo in general has dynamic scope -- which shouldn't surprise me, since lexical scope is relatively new in Lisp. And Berkeley Logo in particular has macros. Maybe my little compiler project is going to be harder than I expected. Honestly, I was imagining I'd be done if I could just parse it, do some simple transformations on the tree and spit JavaScript.

Sunday, January 11, 2009

my code has a side effect: learning!

So I'm working through the exercises in Real World Haskell. Here's what I've got so far. As of this writing, I'm in the middle of chapter 4. RWH totally deserves all the buzz it's been getting; it's very approachable and well-written, and my Haskell is improving rapidly. The book is still free online, but I bought the printed copy, in large part because Bryan O'Sullivan gave such a great talk at OSCON.

My earlier Haskell project had stalled out due to my not being very good at the language yet. What I want to do is build a compiler for Logo that produces JavaScript, with the turtle graphics on a canvas tag. Logo is an acceptable Lisp! I'd love to see it get more use.

Interestingly, though: