Monday, February 9, 2009

And yet more new languages at Google

Last October, I updated Lingwë readers on Google’s online language tools. At that time, Google offered machine-assisted translation of 34 languages (including English). Though impressed, I did manage to complain about the conspicuous lack of several important ones (including Albanian). Well, Google has amended that oversight at least, as well as bringing out several more choices.

The number is up to a remarkable 41 languages. Newly added: Albanian, Estonian, Galician, Hungarian, Maltese, Thai, Turkish. Never mind that Galician is really only spoken by three or four million people, and Maltese by fewer than half a million — it’s nice to see regular progress. Hopefully, Swahili, Punjabi, Tamil, et al., are on the way.

Google has also added a handy dandy link labeled swap, which flips the source and target languages. Very handy now that the list of languages itself has grown so long.

10 comments:

  1. And still no goddamn Marathi Hobbit. :)

    Speaking of the Googles, here's a funny little link for you. And have you checked out the latest iteration of Google Earth (5.0)? You can now actually go underwater and do all kinds of other cool stuff.

    ReplyDelete
  2. BTW, I claim a half-credit as a speaker of Galician. This summer I proved to myself that I can actually understand it passing well, if not initiate my own conversations -- which is to be expected after all my years of hearing it. To be honest, not much separates it from Portuguese -- the Lisbon kind, I mean, not the Brazilian. :)

    ReplyDelete
  3. And still no goddamn Marathi Hobbit. :)

    If Google Translate would get around to providing Marathi, we could make our own. It wouldn’t be a great translation, I guess, but better than one that doesn’t exist. ;)

    Speaking of the Googles, here's a funny little link for you. And have you checked out the latest iteration of Google Earth (5.0)? You can now actually go underwater and do all kinds of other cool stuff.

    Google Earth is pretty amazing. I played around with an earlier version maybe a year or two ago, and it’s come a long way since. Wouldn’t it be great if they incorproated Karen Wynn Fonstad’s (and others’) cartographic extrapolations to create Google Middle-earth?

    As to the stuff captured by the Google Streetview car, wow. Some of that is crazy. I’m somewhat at a loss for what to say about it. It’s a bit — er, invasive — to say the least.

    BTW, I claim a half-credit as a speaker of Galician. This summer I proved to myself [...]. [N]ot much separates it from Portuguese -- the Lisbon kind, I mean, not the Brazilian. :)

    Si, vostede entende galego — como eu, gracias a Google. ;)

    ReplyDelete
  4. Some other African toplanguages missing: Zulu (25 million total), Afrikaans (16 million total), Berber (37+ million total), Sudanese (27 million), Hausa (34million) and Amharic (40 million). But according to my knowledge, only Afrikaans has a significant internet presence - there is a Word Afrikaans spell checker.

    I guess one should ask not only about the number of speakers, but also the number of speakers with internet access.

    ReplyDelete
  5. You’re right: Africa is definitely underrepresented. I guess they count on the fact that English, French, and to a lesser extent Italian, Portuguese, and other European languages are spoken throughout the continent. One note: I think Berber is actually a group of languages; I don’t think there’s an individual language called Berber, is there? Afrikaans is basically a dialect of Dutch, so I don’t mind its absence (on the other hand, they did give us Galician, a dialect of Portuguese). Hausa and Zulu are definitely big oversights, though. Yoruba and Igbo, too, have around 25 million speakers each.

    I guess one should ask not only about the number of speakers, but also the number of speakers with internet access.

    That’s a fair point. Another thing to take into consideration is geographical distribution. It makes sense to focus on languages with the widest distributions, doesn’t it? And still another factor might be the extent of written (vs. oral) culture in the language. In order to offer these languages, Google needs to chew through large texts in the target language.

    ReplyDelete
  6. One note: I think Berber is actually a group of languages; I don’t think there’s an individual language called Berber, is there?

    Jase, I happen to speak all African dialects, and ... :)

    ReplyDelete
  7. You are right - Berber is a closely related group of languages.

    As to Afrikaans - it is actually a tad closer to Flemish than Dutch, for no reason in particular.

    ReplyDelete
  8. @Gary: Nice. For those not in the know, this is a private joke — actually, a conflation of two private jokes.

    @Scylding: Yes, Flemish and Afrikaans are both dialects of Dutch, as spoken in Belgium and South Africa, respectively. Tolkien called his "cradle-tongue" "English (with a dash of Afrikaans)."

    ReplyDelete
  9. Sent here by the Miscellany at the Unlocked Wordhoard, I'm sorry I didn't find this blog before. Hullo! I am moved to comment on a couple of things. Firstly, is Galician really a dialect of Portuguese? Politically I would never dare say this—"Galician is one of the Iberian languages"—but even linguistically I'd have thought, with only a historical understanding, that both it and Portuguese were contemporaneous developments of Romance rather than derived one from the other. Am I philologically incorrect?

    The other, more general, comment I'd make is that I don't think measuring a language's importance by number of speakers for these purposes makes any sense. You need some quotient of amount of publication, most especially on the Internet. I can easily imagine that there are more Galician web-pages than Zulu ones, for the moment at least. There's no sense in providing a tool that has no object yet.

    ReplyDelete
  10. Welcome, and thanks for the comment. It looks like you have a nice blog going yourself, by the way!

    You are quite right about the politics of calling Galician a dialect of Portuguese. But I dare to do so anyway, on philological grounds, because a) they developed from a common Old and Middle Portuguese, which had itself split off from Romance centuries earlier; and b) the rate of mutual understanding is estimated to be something like 85%, though it is somewhat lower for other parts of Portugal, not to mention South America — which I do not dispute. Moreover, I’m using “dialect” loosely, as I did of the relationship of Afrikaans to Dutch. No doubt, though, some galegos might take offense where none was intended.

    As to your other point:

    You need some quotient of amount of publication, most especially on the Internet. I can easily imagine that there are more Galician web-pages than Zulu ones, for the moment at least.

    Fair enough, but there couldn’t possibly be more Galician web pages than Punjabi, Tamil, or Gujarati ones, could there?

    ReplyDelete