Penguin Parens

Thursday, December 27, 2012

reading: Computer Power and Human Reason

Not long ago, I got a copy of Jospeh Weizenbaum's Computer Power and Human Reason from a pile of free books -- there's so much great stuff on the free books table when a professor retires!

I’d recommend reading it if you’re into the ethical issues surrounding computing or the history of AI. The book was published in 1976; our relationship with computing has changed a lot since then, and that was probably the most striking thing about reading the book now.

Weizenbaum was worried about trust that society placed in the computer systems of the time. He describes situations in which people felt they were slaves to systems too complex for people to understand and too far removed from human judgement to be humane; examples include planning systems that told pilots where to bomb during the Vietnam War. But the systems had computers involved, and they were made by experts, so they must be right! "Garbage in; gospel out". And down came the bombs.

I'd argue that we've become less dazzled by computers as such, that we no longer think of them as infallible. But perhaps we're less likely to think about the computers themselves at all. They've become ubiquitous, just the infrastructure that makes society work. My mother (a keen observer of technology) recently remarked that it's strange that we still call them "computers" when the point is to use them for communication. We may still have problems of blind obedience, but perhaps it's better understood as blind obedience to people.

Similarly, Weizenbaum was concerned about the social power wielded by scientists, engineers, and other experts. To me, in the fair-and-balanced political climate, this sounds like a good problem to have: people used to listen to experts? Did they listen to experts when they said things that were politically inconvenient for those with money? Perhaps not...

Computer Power and Human Reason also spends some time with the exuberant claims about AI from before the AI Winter. Herbert Simon said, "... in a visible future – the range of problems they [machines] can handle will be coextensive with the range to which the human mind has been applied", which was clearly somewhat premature. But we have made progress on a lot of fronts! Weizenbaum was quite skeptical that machine translation would be any good, despite claims (which he relates in the book) that MT really just needed more processing power and more data. A few decades later, MT is often pretty good! All it took was more processing power and more data.

There's also some beautifully strange writing. Towards the beginning, he spends a few chapters explaining how computers work, in a formal, abstract way. And then we get this:

Suppose there were an enormous telephone network in which each telephone is permanently connected to a number of other telephones; there are no sets with dials. All subscribers constantly watch the same channel on television, and whenever a commercial, i.e., an active interval, begins, they all rush to their telephones and shout either "one" or "zero," depending on what is written on a notepad attached to their apparatus. ...

I have trouble imagining that this metaphor has helped many people understand digital logic circuits; but I enjoyed reading the book! Perhaps you'd enjoy it as well.

Saturday, July 07, 2012

this is something new and beautiful: Coursera and Udacity

Just last week, I finished the coursework for Coursera's machine learning class. It was great! I had a really good time with it, and I'm fairly proud of the accomplishment.

If you've been within earshot of me in the past few months, you probably know that I'm really excited about Coursera and Udacity and their ilk (including, but not limited to, edX, Khan Academy, and Duolingo). There are two experiences I'd like to contrast with taking a course on Coursera.

Some years ago, I was living in Atlanta and working a real job. And I went over to the Georgia Tech math department to see about taking some masters-level statistics classes, imagining that they would let me pay them lots of money in exchange for taking classes at the university where I had just graduated months prior. But it turned out that they wouldn't let me do this without being admitted for a full-time degree program.

Fewer years ago, I was starting my PhD at Indiana, and knowing exactly what I was there to learn, I picked out three classes: one on NLP, a computational linguistics class (from the Linguistics department), and one from Stats. I got a mild hassle from the department about my choices: these were all "fun" classes, and shouldn't I work on fulfilling my breadth requirements? I've since finished my IU coursework, and let me say: not all the classes I had to take as a result were very interesting, or even very well taught. Some were downright bad.

But now there are free online courses that are meant to be good, such that you take the ones you're interested in taking, as opposed to expensive in-person courses that may not be good, but you're obliged to take them anyway -- this is huge.

Whether or not you think that teaching in person is going to stay relevant, not everybody has access to good teachers in person. This remains true even for people at universities.

Moreover, online classes lower the barriers to entering or leaving a course to almost nothing. Want to sign up for a class just to try it out? Nothing could be easier! Don't enjoy it, or it's not what you thought it was, or find out you're busy with other stuff? Nothing lost, try a different one! But if you stick it out and put in the effort, then not only have you learned something, but also you get a certificate that says you finished! (maybe these could be OpenBadges sooner or later...)

There are going to be lots of bytes spilled about these things in the coming years, but just to make it clear: I'm jazzed about helping people who want to learn things get access to material about those things. And the World Music class is starting up soon, which my mother and I are going to take! Because why not?

Saturday, June 30, 2012

happy hardware review: usb wireless adapter from ThinkPenguin

I'm in Mountain View for the summer, working on Google Translate for another internship with that company that I seem to work for fairly often. Hooray!

Unfortunately, my laptop's built-in wireless card really doesn't agree with the apartment complex's wireless. So I ordered a little USB stick wireless adapter from ThinkPenguin, and it came pretty quickly, and I plugged it in (and told wicd to look at wlan1 instead of wlan0), and it just worked! Now my wireless connection is pretty fast, and doesn't drop every five minutes! (unlike before; it was a serious pain.)

Particularly, I got this one. Their other products may also be lovely. Thanks, ThinkPenguin!

Wednesday, May 30, 2012

take five minutes: support open access

tl;dr: Sign this petition to support open access for publicly-funded research!! http://wh.gov/6TH

Here's the situation: there's lots of scholarly work being done. And you, as a citizen of a country, are paying academics to do science (or whatever), write about it, and review the work of other scholars. The work that makes it through the reviewing process gets published, typically in a journal or at a conference.

Here's the problem: a lot of that scholarly work is then inaccessible to you. You have to pay to read it, and often you have to pay a lot. If you're at a well-funded academic institution, your university library has to pay a lot. It's a serious problem for universities as wealthy as Harvard. Where does this money go to? It doesn't go to the academics who wrote the papers, or those who reviewed them: it goes to publishing companies with absurd profit margins who have trouble pointing at what value they add to the process, aside happening to own prestigious journals.

Concretely, this is a problem for the independent researcher, for the small business developer-of-stuff who wants to get the latest developments, for the interested public who wants to read and learn and grow, for the precocious teenager. I've come to care kind of a lot about this issue: it's because I believe in science. I think it's pretty important: it should get out to as many people as possible, not just because the citizens paid for it in the first place, but also so we can make progress faster.

The National Institutes of Health have famously set up an Open Access mandate: all the research that they fund must be available to the public pretty soon after it's published. Many universities are doing the same thing. The Association for Computational Linguistics (who run the conferences and journals where I'm personally likely to publish), do a bang-up job of making all of their articles publicly available, and I'm really proud to be associated with them. But not every professional organization, and not every field's journal are like this. Most are not!

How can you help? Right now, there's a petition on the White House website where you can ask the administration to expand the NIH-style mandate to other funding agencies: I'd really appreciate if you'd take a minute to make an account and sign the petition. Click here: http://wh.gov/6TH

(hrm, I seem to have written about this back in 2007 too)

Thursday, May 10, 2012

command line tricks: ps, grep, awk, xargs, kill

I recently learned a little bit of awk; if it's not in your command line repertoire, it's worth looking into! awk lets you do things like this:

$ ps auxww | grep weka | grep -v grep | awk '{print $2}' | xargs kill -9

Let's unpack what's going on here. First, we list all processes, in wide format (that's the "auxww" options to ps), then we filter with grep to only include lines that include "weka". When I wrote this line, I was debugging a long-running machine learning task, so I would start it running, then if (when) I found a bug and wanted to restart, I used this command to kill it.

Now we have all the lines of output from ps that include "weka". Unfortunately, this includes the grep process that's searching for "weka"! No problem, just use "grep -v" to filter out the lines that include "grep".

This is where awk comes in. We want to get the process numbers out of the ps output. It seems like we could use cut to just get the second column, but we don't know how wide that column is going to be! Maybe there's a cut option for that, but I don't think there is. Instead, we just use a tiny awk script that prints the second whitespace-delimited thing on each line.

Finally, we use xargs to take the process numbers and make them be arguments to kill. xargs is great: it takes each line of its standard input and makes those lines argument to a program (ie, its first argument). Usually I use xargs in combination with find, "svn status", or "git status -s", to do the same thing to batches of files. Maybe delete them or add them to version control or whatever.

Thoughts? Better ways to do this sort of thing?

Saturday, April 28, 2012

startlingly bad moments in API design

Weka, the machine learning toolkit, has these nice filters that let you change what's in a data set, maybe the features on the instances, or the instances themselves. Pretty useful. One is called "Remove", and it removes features. Here's a case in Weka where order matters when you're setting up the parameters for an object.

Like so: this does not remove any features.

Remove remove = new Remove();
remove.setInputFormat(instances);
remove.setAttributeIndices("7,10,100");
remove.setInvertSelection(true); // delete the other ones.
Instances out = Filter.useFilter(instances, remove);

This works just fine, though:

Remove remove = new Remove();
remove.setAttributeIndices("7,10,100");
remove.setInvertSelection(true); // delete the other ones.
remove.setInputFormat(instances);
Instances out = Filter.useFilter(instances, remove);

How are you supposed to find that out?

Thursday, March 29, 2012

quals writeup: Tree Transducers, Machine Translation, and Cross-Language Divergences

I hope it's not too pretentious to put things I'm writing for my phd qualifiers on arXiv. I think arXiv is really exciting, by the way. Leak your preprints there! Also pretty exciting: tree transducers for machine translation.

Abstract:

Tree transducers are formal automata that transform trees into other trees. Many varieties of tree transducers have been explored in the automata theory literature, and more recently, in the machine translation literature. In this paper I review T and xT transducers, situate them among related formalisms, and show how they can be used to implement rules for machine translation systems that cover all of the cross-language structural divergences described in Bonnie Dorr's influential article on the topic. I also present an implementation of xT transduction, suitable and convenient for experimenting with translation rules.

Paper! http://arxiv.org/abs/1203.6136

Software! http://github.com/alexrudnick/kurt

Thursday, January 19, 2012

and we're back

Well done, Internets!

In solidarity with everybody doing the #j18 protests against SOPA and PIPA, I blacked out this blog and my academic web page; I'd be really surprised if this directly caused anybody to call any legislators. My email to my family on the topic was probably more effective: one of my uncles wrote back, saying he'd signed a petition. So: rad!

The protests seem to have been incredibly loud and fairly effective. At this point, a congressperson would have to be incredibly dense to not get the sense that the public outcry against censoring the Internet in the US is enormous. A number of Republicans, including some former co-sponsors, have taken the opportunity to switch to opposing the bills, which seems politically expedient. (article on DailyKos about this. Kos wonders why Democrats seem to be willing to be left holding the bag...)

But even if we manage to get SOPA and PIPA scrapped, we're still left with two fundamental problems.

(1) The MPAA and RIAA can try to break the Internet again later, because they'll still own a significant number of congresspeople. Say if OPEN picks up steam and gets passed, will that be enough for them? The music and movie industries have fought tooth-and-nail against new technologies for decades; what's to stop them from taking another run against the Internet, or against whatever we have in the future? How can we reduce their money and influence, over time? As an angry Internet activist, getting all of your family and friends to boycott all media produced by major labels and studios seems extraordinarily hard. I must admit: I totally bought an Andrew W.K. CD not too long ago, and we have Netflix at our house. Should we cancel it?

(2) More fundamentally: large companies can own congresspeople. How can we take control of Congress, as citizens? Note that I didn't say "take back Congress" -- there's been a disconcerting connection between money and power for our entire history. If you haven't read Howard Zinn's A People's History of the United States, I highly recommend it.

People much more insightful and more dedicated than me have written quite a lot about this, but I suspect the solution really is campaign finance reform. Simply taking away the incentives to make awful decisions, while encouraging good behavior that at least some people like would probably result in a Congress that... makes fewer awful decisions and has a double-digit approval rating. I think term limits would also be useful, so that congresspeople don't have to worry about re-election so often (although, how to incentivize good behavior in the last term? ...), and some sort of rules to keep former congresspersons from becoming lobbyists, so we can prevent the Chris Dodd situation from happening again. He still goes by "Senator Dodd", but this year he quit the senate to become the head of the MPAA.

Oh, also: Super PACs, and the Citizens United decision.

However we move forward: today, a large number of people who had never tried to contact their elected representatives, now have. The more often we do it, the lower the psychological barrier! There are even phone apps: (android, iphone). I use the Android one all the time; it's extremely convenient.

Right. Anyway. Stop reading this blog, and go read what Lawrence Lessig has to say. I'll get back to doing some computational linguistics. Maybe you should too!

Saturday, December 31, 2011

reading: Readings in Machine Translation

Post before the end of the year!

I'm really enjoying Readings in Machine Translation -- it's got all of these great MT papers from past decades, going from the Warren Weaver memo from 1949 to the Brown et al. paper where they make stat-mt fashionable again in the early 90s. Apparently, a lot of the papers in the volume were somewhat hard to find online in 2003.

Really interesting: so far, the early papers have had some very detailed descriptions of the low-level particulars. "Well, we're going to need this many memory drums...", "oh, and the words will be stored in memory in alphabetical order" (which seems very archaic), and a fixation on picking the right word in the target language, in sort of a word-sense disambiguation sense (which is slightly fashionable again!).

So for people into MT who want a sense of history, these are papers that it seems like one should read -- I mean, Sergei Nirenburg and friends picked them out, so they've got to be good, right?

If you haven't read the 1949 Warren Weaver memo, though -- even if you're not an NLP person -- do yourself a favor and go ahead and read it!

Wednesday, November 30, 2011

securing your MoinMoin wiki

I really like MoinMoin; it's straightforward, it's Pythonic, it's got WikiWords. Great! I've been keeping a bunch of notes on one.

The IU Computational Linguistics group had an aging MoinMoin too, but most of the edits and new accounts were spam. I suspect the edits were being done by humans too, because they were pretty good at fitting into the markup of existing pages.

We replaced it with a new install (here it is) and make it a bit more secure. We disallowed anonymous edits, and made it so that you can't just arbitrarily create accounts, which took a code change. I added the "if not request.user.isSuperUser() ..." block (suggested here) to MoinMoin/action/newaccount.py. The rest of the changes described on that page aren't necessary -- just make it refuse new account requests.

Then it occurred to me: spammers have probably been trying to spam my personal wiki too! I checked: there were about a hundred spam accounts; a new one every day or two. My Moin was allowing arbitrary account creation, but they were all useless because only I could edit pages!

So in the interest of discouraging future webcrap, let me issue a warning: CyrilleVincent, SabaFaulkner, CasinoBonus, life insurance quotes, and ChickyBowen -- I'm coming for you. And you'd best sleep with one eye open, paydayloansuk214. If that's your real name.

Sunday, October 09, 2011

reading: Religious Literacy by Stephen Prothero

Not too long ago, I read the very thought-provoking Religious Literacy: What Every American Needs to Know and Doesn't, by Stephen Prothero, likely because it came recommended by Dale McGowan. The discussion of the history of religious education in the US is fantastic.

The main argument of the book is that, until recent decades, we knew quite a lot about Protestantism, through instruction at home, in churches, and even in the school system. People apparently used the verb "catechize", as something you did to children. But this knowledge of what's actually in the Bible, and actual church doctrines is, these days, largely lost on us in the US, even though we're very caught up in our religious identities and church attendance is huge.

Political discourse is full of religious allusions, but we often don't get the references. I would have appreciated more examples of problems that this causes in practice -- is it really an issue for being a citizen in a democracy?

Prothero notes, early on, that Europeans are much less religious than Americans but more familiar with religious content. Dale McGowan says "faith is most easily sustained in ignorance"; knowing a thing or three about a few different religions makes it easier to not get caught up in any of them. Is the major problem caused by religious ignorance that it makes it easier for preachers and politicians to jerk people around by telling them that God says thus-and-such?

While Prothero doesn't address the question of why the more broadly-educated Europeans don't tend to be churchgoers, he does put forth a policy suggestion, that our curricula should have more information about the world's religions. And while I agree that it's probably a good idea, he doesn't say much about the sorts of changes we might see, with better religious studies education. I must admit, I have a hard time thinking of education-about-religion as anything except a strategy against the influence of seemingly-devout people.

Perhaps I'll pick up his more recent book, God Is Not One, about the fundamental disagreements between different religions, contrasting with the framing you'd get from Huston Smith or Karen Armstrong, who argue that different religions are grasping towards the same fundamental truth. I'm really curious about his personal position, because Prothero identifies himself as an Episcopalian, but hasn't thus far talked about any particular benefits of people believing any particular thing.

Sunday, July 31, 2011

cross-lingual word sense disambiguation

Have I mentioned what I've been working on recently? Maybe I haven't.

In general, I'm working on cross-lingual word- and phrase-sense disambiguation. WSD/PSD is the problem of deciding, for a given word or phrase, which meaning was intended, for some pre-defined sets of meanings. You might get the possible senses out of a dictionary, where they're nicely enumerated, or perhaps from WordNet. The stock example is "bank" -- is it the side of a river, or is it a building where they do financial services? Or, is it the abstract financial institution?

There's a brilliant bit from the prescient Warren Weaver, from 1955 (via):

If one examines the words in a book, one at a time as through an opaque mask with a hole in it one word wide, then it is obviously impossible to determine, one at a time, the meaning of the words . . . . But if one lengthens the slit in the opaque mask, until one can see not only the central word in question but also say N words on either side, then if N is large enough one can unambiguously decide the meaning of the central word . . . . The practical question is: "What minimum value of N will, at least in a tolerable fraction of cases, lead to the correct choice of meaning for the central word?

The "cross-lingual" kind of WSD means that we care about exactly the distinctions that cause you to pick a different word in a given target language, typically because the CLWSD system is meant to be integrated into an MT system; that's becoming fashionable (Carpuat and Wu, 2007). So in this setting, say if you're translating "bank" from English into Spanish, your system doesn't have to decide if it's the building or the institution that owns it -- it's still "banco". Now a riverbank is an "orilla".

In the general case, your system might end up learning how to make distinctions that you as a human didn't know you had to make -- for example, I'm given to understand that Japanese doesn't have just one word for "brother", but "older brother" and "younger brother", which are different enough concepts that they get totally separate words.

Making these choices is typically treated as a classification problem: you get some features for a bunch of instances of usage of a source word, and do supervised learning to get a classifier with (hopefully) good accuracy on the problem of predicting whether this is a "banco" usage or an "orilla" usage. The features are typically things like "which words are in the surrounding context?", or perhaps something fancier based on a parse of the sentence or knowledge about the document as a whole -- whatever you think will be predictive of what the target-language word should be. Hopefully your learning algorithm has some good way of filtering out irrelevant features.

And then, once that's all put together, hopefully you have some extra signal to feed into your translation system, and it makes better word choices, and everybody's happy.

And that's cross-lingual word/phrase-sense disambiguation!

Sunday, June 05, 2011

Mr. Verb on metaphor

Oh, also: there was this great Mr. Verb post, where they talk about government funding to do research on metaphor. Link to an article in The Atlantic: Why Are Spy Researchers Building a 'Metaphor Program'?

Here's the job posting, which sounds awesome, except that it will probably ultimately lead to people getting exploded:

The Metaphor Program will exploit the fact that metaphors are pervasive in everyday talk and reveal the underlying beliefs and worldviews of members of a culture. In the first phase of the two-phase program, performers will develop automated tools and techniques for recognizing, defining and categorizing linguistic metaphors associated with target concepts and found in large amounts of native-language text. The resulting conceptual metaphors will be validated using empirical social science methods. In the second phase, the program will characterize differing cultural perspectives associated with case studies of the types of interest to the Intelligence Community. Performers will apply the methodology established in the first phase and will identify the conceptual metaphors used by the various protagonists, organizing and structuring them to reveal the contrastive stances.

Metaphors We Live By

I just finished reading Lakoff and Johnson's Metaphors We Live By. The most central claim of the book is that metaphor isn't just about language, but that it's a core part of cognition, and that a great deal of our experience is shaped by the metaphors that we use to understand the world.

The authors talk a few times about how we understand love: is it more like a journey, or more like a collaborative work of art? Is a marriage like a partnership, or like a haven from the outside world? One's way of thinking about either of these things probably has a direct effect on experience and behavior.

I'm left with some questions, though, and maybe you've got insights that you'd like to share. In the later chapters, the authors go on to say that understanding metaphor this way requires rethinking a lot of philosophy, particularly what it means to say that a sentence is true. I don't know enough about philosophy-of-language to say whether it's actually been shaken to its core (philosophy is a physical structure), but fairly common sentences like "she's a cool drink of water" (attractive person is refreshing), "he threw in the towel" (this situation is a boxing match and he gave up) or even "I see what you mean" (knowing/understanding is seeing) seem pretty hard to represent in propositional logic.

I think I buy the argument that establishing their truth or falsity is possible only if you understand the metaphor that the speaker is using -- but it's a very typical case, that the speaker is using some metaphor or another. It's nearly impossible to speak or reason without using metaphors (similarity is physical closeness), so maybe in general a good theory of meaning really does have to take this into account.

Is there a point at which a metaphor becomes an honestly frozen form, though? What happens if you learn an expression without knowing its etymology, and just say it because it's that kind of situation? You learn a classifier for "throw in the towel" situations, and say it appropriately, but maybe you don't know anything about boxing. I'll start watching for this sort of situation: maybe it happens a lot.

Also, do Lakoff and Johnson essentially agree or disagree with Doug Hofstadter, who says that analogy is the core of cognition? (the video of Doug's talk is totally worth watching).

I'll probably read some of the more recent work on this: More than Cool Reason (metaphor in poetry), and Moral Politics (metaphor in reasoning about politics) both sound really interesting, possibly even relevant to the real world.

Sunday, April 17, 2011

Python tip: 3.2 adds a memoization decorator

This is a really handy thing to have around, in case you've got expensive computations that you want done more than once. I've written a clumsier version of this myself, once or twice -- but it makes sense to have in the standard library!

http://docs.python.org/py3k/library/functools.html#functools.lru_cache

There's some other neat stuff in the functools package too: currying and reducing. Pretty slick!

Thursday, March 31, 2011

CMU's Avenue Project

Before there was HLTDI, there was Carnegie Mellon's Avenue Project, which seems to have had basically the same goal as us -- produce good machine translation systems for under-resourced languages, especially those spoken by under-resourced indigenous people.

Avenue itself doesn't seem to have been under-resourced, though -- they sent people to South America (Chile, Peru, Bolivia...) to collect training data, and seemed to have a lot of contacts with local educators and language experts. They got quite a few papers out of this line of research, and apparently wrote a lot of good software. They had a much deeper pool of money (and arguably talent) than we do.

And now... the website is dormant, the PhD students involved seem to have graduated, the data and software are not publicly available, and the researchers seem to have moved on to other things. (one of the resulting doctors is the illimitable Kathrin Probst, who hipped me to Avenue when we were both at Google Atlanta, although I didn't really grasp how serious it was at the time -- darn her for being so humble!)

They were pretty gracious in giving us the Quechua data that they collected (and said we could redistribute it), and I've been reading a bunch of their papers, but I'm left some sadness about the whole enterprise -- they surely already went through a lot of the problems that HLTDI is going to have to address. Why can't we just check out and fork their code?

... maybe I should ask for their software too. Science is supposed to be easily replicable, isn't it?

Thursday, March 24, 2011

did you know: F-Spot can export to picasa, flickr, etc!

If you're using Picasa Web Albums, and you're on Linux (maybe a small audience for this post), you don't have to manually upload your files on the web interface or use the Picasa client. The Picasa client is OK, I guess, but it's actually a thinly veiled Windows app bundled with a version of Wine. It's not all that snappy, and it doesn't feel like the rest of the apps on the Linux desktop, and it's not open source.

But! F-Spot is native, and well-integrated with Gnome, and Free Software... and it can very easily upload your photos to picasaweb, or other online photo-hosting places. That's pretty rad.

Thursday, February 24, 2011

Scripts, Plans, Goals and Understanding

So, years ago, Kurt Eiselt, who was then a professor at Georgia Tech, did an independent study with me on NLP/NLU. It was pretty great, though mostly because it got me excited about the field. We wrote really simple parsers for really tiny subsets of English, in Common Lisp, and talked about how language might work cognitively. He had me get a copy of Schank and Abelson's classic Scripts, Plans, Goals and Understanding. I read some of it, but for the most part, it's just sat on my shelf for the better part of a decade.

Out of a sense of "geez, this is something I should have read, I'm an NLP researcher now", I've been working through it. It's kind of slow going: I find Schank kind of light on the details and heavy on the intuition, fairly vague. Maybe the later chapters have more detail about the story understanding system they purportedly built in the 70s.

But if there's been a chunk that's worth reading, it's this, at the very end of chapter 5.

John couldn't get a taxi, so he rode his horse downtown and into a restaurant. He beat up another customer and took a menu from him. He decided to have a steak. The waiter came along and John offered him a bottle of scotch if he listened to John tell him what he wanted to eat. John went to the kitchen and told the cook to give him a steak, because the cook could always deduct the gift from his income tax. When the cook refused, John offered to give him guitar lessons, and that worked. While John was eating the steak, the waiter came back and stole $10 from John's wallet. Then John got on his horse and rode out.

Tuesday, February 22, 2011

Python tip: dir() with no arguments

If you do Python, you probably knew that you can ask an object what it has on it by saying dir(theobject).

I just found this out, though: you can also say dir() with no arguments to find out everything that's in your current namespace. No more losing temporary variables that you forgot about earlier in the repl. It works for finding out which modules you have loaded, too. Holy cow.

>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
>>> x = "this string is amazing"
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'x']

Saturday, February 05, 2011

Machine of Death Remix Corpus!

Machine of Death is fantastic. And it turns out that most of the stories in Machine of Death allow derivative works...

And when I hear "derivative works", I think the word "remix". But what's a remix of a text? Cut-up poetry? n-gram models to generate ominous-sounding emails to your family? Counting how many times the word "of" appears? The original text, but with randomized capitalization? This is for you to decide. Well, it is now. Here's all the derivs-allowed stories, conveniently pulled out and converted to ASCII.

Machine of Death Remix Corpus.

Use this to train your poetry bot, or your story understanding system, or your unexpected other thing! Share and enjoy :)