Have I mentioned that NLTK is the hotness? NLTK is the hotness, particularly if you want to do your language-y things in Python.
It's intended for educational use, but it has what you need, and it lets you compare different algorithms for tokenizing, tagging, and parsing chunks of text, very pluggably. The API is nice too -- you can very easily tell taggers (the parts of your program that decide which part-of-speech a given word is) to make calls to one another in case they can't figure out the right tag independently.
There's of course the OpenNLP tools in Java, but they don't seem near as quick or awesome.
Soon: using transition probabilities on parts-of-speech to shuffle chunks of text and generate new moderately-sensible horoscopes? Yes! (also: this should be helpful in the long term for my automatic poetry project, which will eventually be more Python-and-ML than Lisp-and-formal-rules)
1 comment:
Glad you like NLTK! Is there an archive of a few thousand horoscopes that we can use to train a up a statistical language model to generate new random horoscopes? (Steven Bird; NLTK developer).
Post a Comment