Before there was HLTDI, there was Carnegie Mellon's Avenue Project, which seems to have had basically the same goal as us -- produce good machine translation systems for under-resourced languages, especially those spoken by under-resourced indigenous people.
Avenue itself doesn't seem to have been under-resourced, though -- they sent people to South America (Chile, Peru, Bolivia...) to collect training data, and seemed to have a lot of contacts with local educators and language experts. They got quite a few papers out of this line of research, and apparently wrote a lot of good software. They had a much deeper pool of money (and arguably talent) than we do.
And now... the website is dormant, the PhD students involved seem to have graduated, the data and software are not publicly available, and the researchers seem to have moved on to other things. (one of the resulting doctors is the illimitable Kathrin Probst, who hipped me to Avenue when we were both at Google Atlanta, although I didn't really grasp how serious it was at the time -- darn her for being so humble!)
They were pretty gracious in giving us the Quechua data that they collected (and said we could redistribute it), and I've been reading a bunch of their papers, but I'm left some sadness about the whole enterprise -- they surely already went through a lot of the problems that HLTDI is going to have to address. Why can't we just check out and fork their code?
... maybe I should ask for their software too. Science is supposed to be easily replicable, isn't it?
4 comments:
Fascinating stuff. I would rate this as your most Ballardian technical post to date.
Kathrin is illimitably inimitable! But Avenue isn't.
Ahh, did they ever give you their code??
Awww, I haven't asked. I should ask!
Post a Comment