Monday, November 16, 2009

explanatory power of working examples

The NLP algorithms I've been studying since I started back at school aren't particularly complex. But they're often described with really dense notation: maybe your field does this too! Here's a description, for example, of how to calculate an "outside probability" -- it's the (joint) probability that a particular nonterminal symbol covers a certain chunk of text, and the words outside the span of that nonterminal. This is from Fei Xia's lecture slides (and I think these are pretty good).

Maybe what I need is more practice picking apart dense notation, but in all honesty I have trouble keeping track of what the different letters mean. Maybe a nice dynamic programming implementation springs to mind for people smarter than me, but I have to stare at it (and the surrounding slides) for quite a while!

I think I'd be making a pretty good contribution to the world if I took the algorithms I'm learning and wrote down the most straightforward pseudocode and prose versions I can, with a running Python implementation and descriptive variable names. Surely many people out there would find code easier to digest!

Somebody's already done precisely this with the Viterbi Algorithm wikipedia page, and I'm very grateful to that somebody.


Brett W. Thompson said...

OMG, I can't understand those equations at all!!

I completely agree that Python would be easier to follow :D

Alex Rudnick said...

Hey Brett! Thanks for reading and commenting! :D

Well, there were some relatively nice diagrams nearby... but still, it's a lot of letters to remember!

Clean Python implementation forthcoming, as soon as I finish the homework assignment...

Anonymous said...

Yeah, notation in NLP isn't the best. It's usually better than the notation you get in probability/statistics courses, but only just barely. I partly blame the extensive probability education for ruining NLPers' ability to write equations. Machine learning equations are a bit worse since they're combining the unintelligibility of probability with the unintelligibility of matrices.

While Python code would be nice, I've seen a number of papers with Python-esque pseudocode. What I'd like to see is a more functional pseudocode. For a number of the standard algorithms the imperativism of pseudo-Python muddles up what's really going on IMO.