Sunday, March 31, 2013

language technology for Paraguay

Earlier this month, we went to Paraguay. Why'd we do that?

Paraguay is the only country in the Americas where people are bilingual with a European language and the indigenous language. Paraguayans, the majority of them anyway, really do speak Guarani, or depending on context, a mixture of Spanish and Guarani called Jopará.

While we were there, we talked with Guarani-language teachers at the institutes where they're training translators and linguists. They let us sit in on some of the classes and talk with the students. We're working on building them a computer-assisted translation webapp that will help us collect lots of bilingual text! This is going to be huge.

While we were in the area, we also talked with the local One Laptop Per Child folks; there's an OLPC installation in the little town of Caacupé, which is near Asunción. The OLPC folks said that we should probably go visit local grade schools. Which we did!

It hit me, after the first visit to the schools, how much we were making use of our foreign-white-scientist privilege. We didn't have anything to do with the OLPC project -- aside from a desire to collaborate -- but here we were, wandering into schools without so much as a release form, talking to the kids. I'm trying to imagine Paraguayan scientists coming to the US to observe technology use among los niños estadounidenses.

The really interesting thing here: the kids in Caacupé weren't so surprised to see foreign scientist-looking guys coming to talk with them. They were really friendly, and eager to show off what they could do with the laptops! I get the impression this happens fairly frequently.

So there's a lot of stuff that needs to get built, to make computer use in Guarani more pleasant.
  • At a very basic level, it's hard for people to type the diacritics that you need for Guarani, if your keyboard layout is set to Spanish. The diacritics for Guarani actually aren't that weird; they've got tildes on some vowels, but you see that in Portuguese too.
  • There's no good spellchecker. Guarani morphology is pretty complicated, so this is not an easy thing to build. But we know a guy who's working on it...
  • Text-to-speech. They kids in the schools have text-to-speech for Spanish, and they love it! There's a program on the OLPC where you can send messages to a friend's computer, and the receiving computer will speak your message. It's hilarious. But it doesn't work for Guarani. And as you get further out into the country, the kids are more likely to be monolingual Guarani speakers...
  • The computer-assisted translation website: we're working on it. I'll write more about this soon...
I gave a talk about all of this at the computational linguistics seminar: here are the slides!

5 comments:

Shahana Shafiuddin said...

Good to know about their languages

Juaning said...

Awesome stuff dude, I'm a paraguayan living in Sydney, Australia now, let me know how can I help you. BTW I'm software engineer.

Alex Rudnick said...

@Juaning: Great! We could definitely use some help!

We're going to start building out the first webapp pretty soon, like in the next few weeks. The server-side will be in Python 3, and we're figuring out what we'll do for the client side. (suggestions welcome!)

Do you use github?

Juaning said...

Hey man, yes, I do, I've started follow you. My handler is as my nick here "juaning". and my gmail as well.

Joe Barfield said...

Maybe you could incorporate a Captcha style query widget for websites, blogs and facebook. It would show a word or phrase and challenge for a translation. This would only work on pages frequented by fluents. But it should be easy to get added on any sidebar claiming to promote Guarani culture or Guarani language blogs. The same phrases would be presented to multiple users to weed out wrong or goofy responses.
This might not get you a ton of material, but knowing Guarani pride, the data would build steadily.