Saturday, March 31, 2007

list comprehensions!

List comprehensions. I've been a fan of these for a while, but I'd like to share:

tenpercent = len(lines) / 10
testset = random.sample(lines, tenpercent)
trainingset = [line for line in lines if line not in testset]


Python makes me warm and fuzzy on the inside. Also, random.sample() is pretty sexy!

Wednesday, March 28, 2007

brains 'n' balancing training data

- Martin points us to a nice article over on Developing Intelligence: 10 Important Differences Between Brains and Computers. Your computational metaphor just breaks down eventually, y'know? The brain is not very much like a Von Neumann computer. It's a lot squishier.

If building classifiers is your thing, you may be interested to take a look at these articles:

- Gustavo E. A. P. A. Batista , Ana L. C. Bazzan, and Maria Carolina Monard: Balancing Training Data for Automated Annotation of Keywords: a Case Study.
Three researchers, seven middle names, one novel technique for building balanced data sets out of unbalanced ones for training classifiers: generate new instances of your minority class by interpolating between examples actually in your dataset. I'm still trying to decide whether this approach should work for the general case -- does it make too many assumptions about the shape of the space? Particularly: can you arbitrarily draw lines (in higher-dimensional space) between positive instances? What if there are negative instances between those two? Which dimensions do you look at first, and how is this better than just adding some noise or weighting positive examples higher? (is that last option the same as simply counting them several times?)

- Foster Provost: Machine Learning from Imbalanced Data Sets 101.
A basic overview of the problem, examining the motivation for building classifiers at all and some different approaches to sampling. The award for Best Name Ever goes to Dr. Foster Provost.

Friday, March 23, 2007

reading feeds over the web

It seems like all the cool kids are using Google Reader these days, and I must admit, I'm impressed. The interface is so clean, and a web-based feed reader where you can aggregate all of your habitual reading seems like the right thing. And the ability to share items from your feed with your friends without making link-only blog posts (or forwarding emails around) is pretty compelling. On the other hand, the point of so much of the web thus far has been linking to other parts of the web: it has a mysterious self-referential nature... does this sort of sharing diminish that aspect? Does a meta-feed like this put you, as the independent media maven that you are, on a different level than the well-established weblogs? When you stop putting links in your blog and publish a Google Reader feed, are you more like BoingBoing or metafilter, or less?

It is the future. We've got dynabooks and memexes, and we use them to distribute pictures of cats doing cute things.

O my vast readership, I address to you this question: how do you read news online? Do you have some separate feed reader program? Do you use your browser's RSS features? Google Reader? Your LiveJournal friends page? Something else?

And moreover: for the LiveJournal denziens, does anyone know of a good method for reading "friends-only" posts through Google Reader? There are a few posts out in the world on this topic, but nobody seems to have a decisive answer yet... perhaps we can answer the question definitively.

Monday, February 26, 2007

Ripping DVDs with Free Software!

Just today, a friend of mine needed a copy of a DVD that she could take around with her on a hard drive. So I thought, "well, I'll just take the disk image of it..."

But Apple's DVD player won't play DRM'd disk images, of course.

However! Here's a very nice howto for some very friendly software for Linux, Mac OS X, and BeOS (I know you've all got BeBoxes out there) that'll make video files from your DVDs, no sweat. Super-easy to use. Your mom could do it.

Thanks for the link, Cory Doctorow!

programmable tab completion: you may already have it!

Not long ago, I was doing some mundane upgrade task on the Ubuntu box on my desk. I'd switched out monitors and video cards, and I wanted to make it reconfigure X.org. I type out "dpkg-reconfigure x..." (details are orthogonal, I suppose). But the amazing thing was, I hit tab out of habit, and bash magically filled in the name of the package and some other options, appropriate for the context!

It turns out that this is a feature known as Programmable Completion, available in modern versions of bash and enabled by default in Ubuntu! Who knew?

For example, the Ubuntu version of the programmable completion only fills in ".java" files when you're issuing a "javac" command, and it auto-completes class names (but not .class files) when you're trying to run a Java program from the command line. Flippin' sweet.

Monday, February 05, 2007

Programmatic poetry: oh noetry!

Automatic generation of texts has been on my mind for rather a while. Particularly, I've been thinking about poetry and the what characterizes it, in contrast to, say, technical manuals or want ads. Some (five?) years ago, my cohort Esther and I had set at the generation problem from an ontology standpoint, trying to figure out what it would take to get thematic relationships into an automatic poem. We didn't get very far, probably because at that point my first instinct was to code everything from scratch in C!

Anyway, it turns out that there are non-me people interested in this sort of thing, and the ever-helpful Graham has pointed out a bunch of interesting things happening in the field!

- The prosthetic imagination is a blog by a one Jim Carpenter, who's been working on Erica T. Carter (aka "the electronic text composition project", mentioned on GrandTextAuto here), which uses probabilistic grammars to generate free verse poems. I think the output is pretty convincing ("convincingly what?"); according to Mr. Carpenter, it's rather unnerving to readers who've been informed that they were composed by machine.

It's interesting how people react, when confronted with "creativity" from a non-human source; one is reminded of Douglas Hofstadter's surprising reaction to David Cope's lovely work with algorithmic music composition, which makes music, in a sense, in the style of other composers.

I'll have to read more, but I'm not entirely sure, if it's just using hierarchical grammars, how Erica is different from The Postmodernism Generator (the best-known use of the world-famous dada engine)... but I'll report back on this later.

- There is an Electronic Poetry Center at Buffalo. Interesting!

- Gnoetry is another system out there, and a very prolific one at that, apparently connected to this super-fascinating Beard of Bees publishing group. Language is a prosthesis of an ancient neuro-chemical regime; but now the chemical author is dead. Gnoetry places language at a remove from its typical sources: pre-conscious governance, psycho-historical flux, conscious-mind narration. YES. I will be getting in contact with these guys.

- At upenn, they have a series of readings, M^<4|\3, with all sorts of "literary uses of technology" things going on, including, next week, Flarf poetry (!) .

- Speaking of literary uses of technology, the GTR Language Workbench looks like something between Eclipse and a word processor... I'm not quite sure what to make of it yet.

I'm all excited. Let's get hacking.

Tuesday, January 30, 2007

begging the question, language composition and orthogonality

Not too long ago, one of my friends used the phrase "begs the question" in the colloquial sense of "what we're talking about suggests that this other issue should be addressed".

And I've come to the point in my life where this doesn't bother me anymore, despite the fact that I know the technical rhetorical sense of "begging the question" -- an argument presupposing what it's trying to prove, often implicitly. I prove that unicorns exist thus: all those magical one-horned horses out there are unicorns. I prove that there's an objectively extant material world by kicking a rock and hurting my foot.

This post, of course, begs the question: will I be secure enough as an armchair philosopher to start using the phrase in the vernacular sense? I'm torn: there are few things I like less in the world than prescriptive grammar, but few things I like quite as much as precise, expressive expression.

Wednesday, January 24, 2007

gmaps and quicksilver

So I've started work on the gmaps mashup for runners I'd mentioned last time... the Google Maps API is very straightforward, easy to understand. And getting a simple example working is quick. I'm not surprised, but it turns out to be very pleasant. Everything you wanted to know about gmaps is right there, have at!

Also: my ATLhack compatriot Erik introduced me to this really nice interface tweak for Mac OS X -- quicksilver. It lets you do a lot less mousing on the Mac, which is a pretty welcome change -- a quick key-tap, and it pops up a window where you type the first few letters of something, say an application or a folder or whatever, and it searches out what you probably mean! It seems like it's more efficient than reaching for the mouse, and for right now I've taken everything off my Dock to see if quicksilver is a viable replacement. Thanks, Erik!

Thursday, January 18, 2007

Run and jump on that gmaps bandwagon!

All the cool kids are doing Google Maps mashups, and it just occurred to me, while chatting with Graham: we could do one that picks a meeting place for runners to meet. Say given starting points for n runners, it finds some convenient corner for everybody to get together. For the case of two people (say, me and Graham), it might be just the midpoint on the best path between our houses -- but what about a big running club? And what if some runners are stronger than other? Clearly, some weighting and scaling is in order. And you could put in how fast you expect to run and when you want to get there, and it could tell you when you should leave, adjusted for traffic!

Maybe not as immediately useful as Gmaps Pedometer, but it'd be fun to put together. The API looks kinda neat, and I should learn this newfangled Web 2.0-AJAX-web-services schlock one of these days...

Wednesday, January 17, 2007

Mapping and reducing is like popping and locking for programmers

If you don't read Lambda the Ultimate, that's quite alright. But fairly often over there, you find a link to Why Functional Programming Matters, by John Hughes. Recently, I decided to sit down and actually read it. It's mind-expanding!

Previously, I'd thought about mapping functions onto lists as an operational thing, a set of steps to complete, but that's not the cleanest way to think about it. Mapping is actually a special case of "reduce", an operation where you just go through and replace the all the "cons" functions in a list expression with something else, then evaluate the expression again. "map" functions have a cons in the function they're reducing with, so the end result is another list.

For example: you might write, in lisp: "(mapcar #'(lambda(x) (* x 2)) '(1 2 3 4))", yielding (2 4 6 8).. and you might think of that procedurally... but a map is just a reduce where "cons composed with the mapped function" replaces every "cons". Append can be written similarly; it's all essentially just replacement, function composition and evaluation. The paper also goes into beautiful issues like lazy evaluation (the author says that if he wrote the paper now, the examples would be in Haskell!) and continues to do some lovely examples, some numerical and one very close to our heart: a bot that plays tic-tac-toe, optimally, with pruned game trees.

Years ago, Kurt Eiselt told me that the future of computing was going to be functional languages on very-parallel hardware; functional languages, at least in principle, make synchronization easier by limiting or removing-altogether side effects. (although reconciling this idea with stateful, event-driven end-user applications is another issue!) To an extent, it looks like he was right, and the future is here! MapReduce is the method Google is using to crunch super-giant datasets with enormously parallel clusters. It's not just for functional languages, of course, but the idea is there.

Friday, January 05, 2007

Possibly interesting, but almost definitely not useful!

- My aunt, years ago, had a Newton eMate 300, and I recently found a Newton PDA while cleaning out the space behind an A/C unit -- in any case, I've had this weird fascination with Newtons for years. Now, you can relive the Newton Magic that you probably never experienced in the first place, under emulation! The Einstein Platform is a Newton emulator that works pretty well -- and you can find Newton ROM images here. It's kinda interesting. (Although I got stuck in fullscreen mode once, careful!)

- Self. It's been ported to Linux. It's one of those languages that one feels like one should learn more about. It has interesting family relationships with SmallTalk and Dylan and JavaScript...

- Thinking Meat celebrates the holidays. I just found this blog, but over at the Thinking Meat Project, she has a lovely article about taking part in the culture around the holidays and coming to terms with the cognitive dissonance from enjoying good Bach choral music while feeling like one shouldn't be participating in religious rites, for consistency's sake. It can hard to balance these things, particularly soon after giving up a faith.

Friday, December 15, 2006

Help me, Don Knuth, you're my only hope!

If you're typesetting something with pdflatex -- you can't include eps images with the graphicx package.

You can only use figures if they're in pdf format, but the error message for when you try to use eps is non-helpful.

Maybe this will help someone some day.

Monday, December 04, 2006

rapid-feedback poetry and lisp environments

I think, ultimately, TEB is going to want to be less of a fully automated thing and more of a computer-aided composition tool for poetry. Or maybe it'd be better to think of it as a generator, with a human in the loop. Something that would let you get really rapid feedback and come up with suggestions. It would let you build poetry by search, recognizing what you like and what you don't like. And it'll keep versioning information...

Hey, speaking of development environments! On Lemonodor, I just found out that there's an Eclipse plugin for writing Lisp called Cusp. It uses SBCL and swank, like all right-thinking lispers, letting you do SLIME-like things without Emacs -- SLIME being the currently en-vogue common lisp development environment, and swank being the backend. There's a similar project out there that I've been watching too -- Slim-Vim, which is an attempt to make that same swank code work with vim. (the mailing list has been a little quiet, but it might pick up steam again)

Saturday, December 02, 2006

python generators and more live coding music!

I just found out about the yield python keyword, which lets you produce generator functions. This came up, in a practical context, because I wanted a clean way to get n items at a time out of a list, and the Python Cookbook approach uses generator functions. They let you build an object that essentially contains a closure of the current environment, which can be iterated on. Sort of like lazy lists! Of course, the same behavior could be done with C-style static variables, but this is really pretty.

- yield keyword from the Python docs
- a discussion of generator functions over at IBM Developerworks.

Also! Brett alerts us to Impromptu, another live coding environment for making music and stuff! From their site:
Impromptu is an OSX programming environment for composers, sound artists, VJ's and graphic artists with an interest in live or interactive programming. Impromptu is a Scheme language environment, a member of the Lisp family of languages.

Sunday, November 26, 2006

What, he keeps on posting about pluralism and that Sam Harris guy?

These people, on the other hand, don't seem to see religion and religious differences as a problem. In fact, they seem like very nice, reasonable folks.

The Pluralism Project at Harvard
Interfaith Youth Core

IFYC was started in part by Dr. Eboo Patel, who gave a very nice interview on NPR, which I remember hearing on the radio when it was broadcast. He also wrote a This I Believe piece.

Sam Harris still isn't convinced, of course. But you'd think they could come up with a better debating partner for him than Dennis Prager? (there's a fairly interesting email debate between them on jewcy; he must not have read my previous post, Sam didn't, because he didn't address my concerns.)

Genetic algorithms for Dr. Mario Strategy!

It turns out that one Paul Kuliniewicz has decided to build an AI to play Dr. Mario -- or more accurately, to learn to play Dr. Mario and then play it, using, ironically, genetic algorithms. It's called Wallace, and it builds up and breeds different strategies for pushing pills around. This seems like a pretty good domain -- and I suppose puzzle games of this sort are in general... I wonder if anybody's done something like this for Tetris? Other real-time puzzle games?

The interesting thing about this -- he's hooked his evaluation function for evolutionary candidates into the NES emulator. Pretty clever stuff!

Saturday, November 25, 2006

"I know you don't know..."

So this fellow Sam Harris has recently come onto my radar. He's in the Richard Dawkins "religion is pretty immediately harmful and we need to get rid of it" school of thought, and he's published a pair of books (The End of Faith and Letters to a Christian Nation) in which he spells out his position. There's a video of him giving a talk out there, and I think he's a pretty good speaker...

But it might just be because I'm inclined to agree with what he's saying. I've been trying to work this out systematically. There are some very clear ways in which religion-inspired positions can be detrimental to people's health and happiness -- we've got the oft-cited condoms-in-Africa and the stem cell research and the homophobia... and all sorts of positions, say environmental ones, that you would rationally take if you thought that these were the End Times, that Jesus was coming to save the day in the next 50 years or so -- which supposedly almost half the country buys into. And that's just not a healthy belief for people to have, if the rest of us want to establish a sustainable living environment for people in the long-run.

And deep down inside, I think a property of people (for now) is that we want a holy war -- we need something to rally against, some sort of emergency situation to respond to. And we think, "well, this is wrong. These well-meaning people are misled, and it's so pervasive, particularly in this country, and you can't really talk about it..." So they're talking about it, Harris and Dawkins are.

However, I think where Dawkins and Harris fall down is in two major ways:

Firstly, not all religious people take all the purported beliefs of their religion to their logical extremes (and why not is an enormously interesting issue). This is a central point in Harris' argument, and I'm trying to work out what I think about it -- he says that religious moderation and pluralism essentially ropes off faith from rational discourse and makes it okay to believe whatever you want, and nobody's going to question it, which provides cover for extremists... and as moderately-minded folks, we have trouble believing that people really believe this schlock, but he assures us that they do -- and that genocides and jihads really aren't just about economics and education like we want to believe they are, but are honestly religiously motivated.

And yet the world is full of people who identify with one particular faith and still do wonderful things for the world. Many Christians feel the need to be good stewards of the planet, they save the water and the air and the narwhals, they feed the hungry, they provide medical attention all over the world. And many are full of love and hope and tolerance, and they stand up for their gay friends and make beautiful works of art and fill the world with music.

Of course, these people might do these exact same things without religion -- maybe they're just beautiful people. But then, in a world without faith, I wouldn't be writing this post. So it goes.

Secondly: the biblical god is perhaps not in the same class of entities as Zeus or Inari, and I think it's an oversimplification to put statements like "there is a giant diamond buried in my back yard" (an example from Harris) in the same class as statements about the nature of a more abstract deity -- at least without further examination. Now the idea that Jesus is physically coming to end the world soon, maybe that's in the same category as the giant diamond, but what about the proposition that there's an inherent moral structure to the universe, or that your dead friends and family aren't really gone? Or that things Will Ultimately Work Out? ... Of course "I wouldn't want to live in a world where X is not the case" isn't really a knock-down argument to convince us of any of these things.

So anyway, BG, as we'll call it, is something a little different, with a fluid identity somewhere between that of Zeus (the old-testament local sky-god, rooting up the local fertility cults and being mad at some people while briefly favoring others), a mythological sun-hero, and something like a Lucasian Force or the Tao or even identifying with all of the world, like you might find in Spinoza.

Richard Dawkins had quipped that everybody is atheist about most of the gods that have ever been dreamt up -- some of us just go that last step. And I think that's an oversimplification, because for a lot of people, god is just that abstract-orderliness-principle... if your view of god is, like this fellow RJ Eskow puts it, the sheet music of the universe, then this isn't all that different from believing in causality and some initial state of things, is it? And don't a lot of people hold beliefs like that?

I think what Sam Harris really gets at is that it's almost taboo to talk about faith as an absurdity in polite society. Many people get really defensive about their theology, if you bring it up. There's this weird feeling of guilt, at least for me, being an unbeliever bringing this topic up. I don't believe like you believe; in fact, I think you're wrong about some pretty fundamental things and I'm trying to gauge whether it's harmful. And that's where the guilt comes from, I think -- we're faced with the prospect of explaining to our loved ones that they and some large chunk of society hold possibly-destructive beliefs. It feels wrong because we know folks who are nicer people than us (and furthermore more devoted social activists) and it's hard to find fault with them and what they believe -- no harm, no foul, right? Is it so destructive to believe in a universe that has some sort of underlying order to it, that wants, in some sense, for you to be nice to people, wherein you identify the BG with a kind and loving parent?

I mean, it's Wrong, of course. And I still haven't addressed Sam Harris' idea that toleration and letting people believe what they want provides cover for fundamentalists. And the propositions that the wonderful and nice people hold true might even be largely the same as the ones that folks we might label as societally destructive believe in... this is a difficult and tangly problem. The mind is vast and greatly partitioned.

So at a higher level, do we believe in truth and the search for it, or is what we believe pretty much unrelated to any objective world that might be out there? And how can you possibly sit still when you know in your heart that you have this truth that's vitally important for everybody's eternal well-being that snot-nosed AI kids on the internet are calling schlock? Doesn't religious pluralism lead the way to trivializing religion as a whole? The idea that there's some abstract higher truth that's filtered into different societies in different ways is attractive to many, but I think it breaks down when you get into the specifics of what religions are actually saying. Unless everybody's just speaking in metaphors and hyperbole most of the time.

(This last cluster, the "abstract orderliness to the world" has its problems once you try to work out the details, of course. Particularly, it doesn't square with BG as well as many would like -- you start ascribing all of these perfections to BG, maybe with an aim to working out an ontological existence proof and then you're left with The Problem of Evil or justice anyway. Let's leave this to another post, or perhaps a book.)

(although many in Christian contexts have preached against the God-as-the-Force idea in favor of a more personalitied BG, Huston Smith interestingly characterizes Hinduism as encouraging whichever idea about Brahma one personally finds more worshipable)

Here are some interesting blog posts: alls I'm gonna say is that people who argue against Sam Harris seem to mostly rely on ad hominem attacks and the idea that he has a faulty moral compass.

- RJ Eskow: Reptiles of the Mind -- giving thanks for rational atheists
- RJ Eskow: The sad state of atheism today
- Sam Harris: In Defense of Torture (seriously, Sam, wtf?)
- Steven Pinker: Less Faith, More Reason
- Marty Kaplan: Atheists for Cheney

Wednesday, November 22, 2006

GWT and ChucK

So maybe two years ago, our good friend (noted security researcher and computing maven in general) Tim J had been kicking around an idea for developing web applications: he wanted to use a more general interface mechanism, say Swing or GTK, for laying things out, with some other layer figuring out how to express what you put together in terms of web languages. At the time, I didn't see the need...

But Google did, apparently! GWT lets you design webapps in terms of Java, running on your local machine for testing purposes. GWT then compiles them down to Javascript when you're ready to deploy for the rest of the world. I'm probably the last to find out. It's open source; you can go play with it if you want. Pretty crazy.

http://code.google.com/webtoolkit/



Also; after rather a while of hearing my friends work with and develop the music-programming language ChucK, I finally took the time to go play with it. At first I tried to use the Audicle -- which is a really pretty IDE, all done up in OpenGL -- but it crashed on my Mac, so I tried miniAudicle instead. That worked just fine, with a nicely intuitive, minimal interface. After just a few minutes, I was making some interesting bleeping-and-booping noises. It was very satisfying! I think I like the square-wave generator best.

ChucK definitely warrants checking out, if you have any sort of urge to make cool noises, or even cheesy music. Graham Coleman, who wrote said cheesy music tutorial and has in recent months been performing live with ChucK in Atlanta, was just now on the radio, and he proceeded to rock the airwaves pretty hard. Hooray!

Wednesday, November 15, 2006

The Enron Email Dataset!

So they might not have really been the smartest guys in the room... but one really good thing coming out of Enron is all those emails. 2.6 gigabytes worth, for your perusal or data-mining, social-network-mapping, and language-modeling pleasure. Thanks, guys!

The corpus is here!
(thanks, CMU!)

Sunday, November 05, 2006

"Here, lemme take a Pikachu..."

So say that you want to pepper your speech with non-sequitur references to some particular topic -- like whenever you use a string of words that rhymes with the name of a Pokémon name, replace that string with the name of said Pokémon.

That wouldn't be that hard -- there's a pretty happenin' metric for finding rhymes, ready to go.

So you'd just have to have a list of the words for the domain you want to substitute in, make sure that you have phonetic descriptions of each of them (Bradley Buda's rhyme metric uses the CMU Pronouncing Dictionary's format)... then run through your input text searching for substrings that rhyme with words in your list with a score over some particular threshold. And hilarity ensues.

A similar technique could be used for generating phrases like "my feet are staying!" for "auf wiedersehen!" -- you could something hill-climbing-like (with a parser in the loop, so as to try to maintain grammaticality) to substitute out common phrases...

The only problem is -- what rhymes with "Psyduck"?