Adrian Roselli: i18n

Showing posts with label i18n. Show all posts

Wednesday, November 13, 2013

Captions in Everyday Use

Yesterday Henny Swan asked a simple question on the Twitters:

I'm curious to know, who uses subtitles on web content (X device) who's not deaf or hard of hearing? For example I did when breastfeeding.
— Henny (@iheni) November 12, 2013

Adam Banks put together a Storify of the responses that show there are plenty of use cases for those not hard of hearing to get value from closed captioning.

In general, any context where either the audio track is loud enough that the viewer doesn't want to disrupt those nearby, or the background noise is too much to hear the audio track clearly, is a case where captions have value for all users. Other cases that popped up include multi-tasking or working with a new language or just tough accents.

In short, closed captions have value for all users.

There is also no reason to panic about providing them, particularly if you use a video service that can do them for you. For example, back in 2010 YouTube committed to enabling auto-captioning for everyone, and Google has documents to help plus tutorials from others, such as this step-by-step or or this video.

Image of the captions in use on President Obama's speech about the Chile earthquake.

Of course, as I was writing this post, Henny posted her own reference to the Twitter conversation: The weird and wonderful reasons why people use subtitles / captions

The Storify of responses I mentioned above is embedded here to spare you all the hassle of clicking the link and to bloat my page with unnecessary script blocks:

View the story "Conversation with @iheni, @obiwankimberly, @CharJTF, @mikegulliver, @stevebennett, @caledoniaman, @patrick_h_lauke,..." on Storify

Update: November 14, 2013

While I was writing this, Dave Rupert was putting together a very neat experiment, Caption Everything: Using HTML5 to create a real-time closed captioning system.

It's a neat proof-of-concept to show how real-time closed captioning is a possibility with current technology, albeit imprecise and cumbersome. If nothing else, hopefully it can bring more attention to a technique that, as demonstrated above, can benefit all users in everyday situations.

It's such a nifty experiment, I am embedding it here (remember, this isn't mine, this is Dave Rupert's code):

See the Pen Closed Captioning with HTML5 by Dave Rupert (@davatron5000) on CodePen

Monday, May 7, 2012

New Crowdsourced Translation Option

Ackuna.com logo. Many organizations don't have the budget to guide them through a full translation / localization project, and some don't even know where to start. In late 2009 I wrote about low/no-cost options from Google (machine translation) and Facebook (human-powered): Facebook and Google Want to Translate Your Site

A new option has emerged recently, covered in the Mashable piece Free Online Human Translation Service Takes On Babelfish, Google Translate. Unfortunately the writer of that piece doesn't seem to understand the rigor that has to go into the translation process, so opportunities to provide a deeper analysis are missed in the article.

The service is called Ackuna, a free offering from a translation agency. Mashable's suggestion that this service takes on the two translation giants on which most web users rely is silly — Google and Babelfish provide real-time machine translation. Ackuna does neither. Ackuna uses people to provide translation and does so at the pace of the volunteer translators.

I have already made a case against machine translation for anything other than casual or immediate needs. I almost always counsel my clients against its use, including the free Google translate widget you can drop into a web site. There are exceptions, of course, but that's out of the scope of what I am addressing here.

Because Ackuna uses humans for translation, there are a number of questions that anyone looking to use Ackuna should ask. I detailed a set of questions in my 2009 post, but I'll recap here (excluding the questions regarding Facebook Connect):

Does Ackuna attract users who are fluent in the desired target language?
Are these users willing to help translate your content for free?
Is the translator a subject matter expert?
Is the translator part of your target audience (including geographic and demographic breakdown)?
Are you (or your client) comfortable letting unknown third parties translate your message?
Is time budgeted to identify content for translation?
Is time budgeted to have someone review the translation?

Ackuna's FAQ page answers some of these questions, but doesn't really explain how you qualify a translator. Ackuna's translators are ranked in the site by a combination of user feedback and badges. Think upvotes and downvotes, with points determined by whether or not a translation (or a step) was accepted or not. Badges are awarded based on other translators marking submitted translations as accurate.

When it comes to deciding whether a translation is correct, assuming you don't speak the target language, Ackuna doesn't make any guarantees:

Use a translator's reputation and badges as an indicator of their credibility, and take into account the comments and feedback left on each translation by other users. Use these factors and your best judgment before accepting the translation of your text.

If timing is a concern, remember that translators are providing translations because they want to. The only pay-off for these translators are badges and points. When you have no contract and no way to pressure someone for work, there is no guarantee it will ever be completed. In case you can't wait and decide to walk away with what's been translated so far, from the FAQ:

How do I download my completed translation?

[…] You will not be able to view a completed translation until every segment in your project has at least one translation submitted.

Not being able to secure translations can be a bit tricky, too, especially if some of your content is sensitive or personal. Given this clause in the terms & conditions, you may want to think hard about what you post for translation:

[Y]ou give the right to Ackuna and its affiliates to store your input indefinitely and reuse it at any time and for any purpose at our discretion.

Ackuna needs critical mass to produce good translations (or translators whose profiles don't read like Hipster spam-bots). It needs many translators reviewing each others' work to produce robust translations in timeframes that matter for businesses. Ackuna needs more users ranking one another's work, otherwise it may be too hard to know if that Simplified Chinese translation really conveys your message properly — especially when the translators all have a similar rating. Ackuna's bare-bones interface may not help it attract good Samaritans who just want to translate, since it's not too easy to see all the projects in one pass (you have to page through them) and the search feature doesn't work (yet, it claims).

Ackuna itself is not a bad idea. A translation workflow and process is a necessity in any translation project and Ackuna provides some of that. If you already have translators available to you, it might even make an effective no-cost solution to manage the workflow and get others to weigh in on the work.

What Ackuna could do is counsel its users on what makes good translation, maybe even cross-selling its parent company's services. From there it should group translations into industries or subject matter so that those with experience in them can find content more relevant to their skills. In addition, finding a method to indicate a translator has a specific industry or region expertise and provide a ranking system for same can go a long way to helping a user understand if his or her translation is as good as it could be.

I want to be clear that I am not criticizing Ackuna (though I could be criticizing Mashable's presentation of Ackuna). Providing a free service for something so rooted in the complexities of human language goes beyond what its technology can do. As I have commented before about free services, you get what you pay for.

Tuesday, January 17, 2012

HTML5 Will Play Nice with Translation

HTML5 Logo with character for Chinese number 5. Back in late 2009 I wrote a little something talking about Google Translate and the risks associated with relying on machine translation for anything critical ("Facebook and Google Want to Translate Your Site"). I even offered some examples of things that are tough to translate.

One real-world example I did not list was when I used machine translation to process a page with someone's name. The first name we'll say was "Bill," but the last name was definitely "Belt." Somehow instead of "Bill Belt" being retained as his name throughout the article, he was renamed to "Bill of Leather Strap."

This particular example is one step closer to being a thing of the past. In the latest W3C Open Web Platform Weekly Summary, a new attribute has been announced for HTML5 that will allow authors to exclude specific content from being translated — for any service that will honor it. The announcement:

A global translate attribute will be added to HTML5. The values are yes or no with the same inheritance policy than the lang attribute. The goal is to specify if a piece of text should or not should not be translated automatically.

Of course, if I want to exclude Bill Belt from being translated, I'll probably have to wrap his name in a span in order to throw a translate="no" in there since I doubt I'll have an otherwise semantic or structural element in place already. This does, however, offer a far better solution than the previous suggestion of using a class to achieve the same effect.

To be fair, Google Translate already has its own support for excluding content from automatic translation, specifically using class="notranslate". Head over to the Google Translate Help page and expand the bottom-most option, "General information for webmasters" (nice to see they make it easy for a direct link).

If you are curious about the process this went through to become a change for HTML5, you can see the bug report that started it all back on April 4, 2011: Bug 12417 - HTML5 is missing attribute for specifying translatability of content.

I don't believe that machine translation is ever a good way to translate or localize content for anything more than casual use. For example, legal matters, healthcare, and things like that are poor candidates for machine translation (I have far more to say on this point in the post linked above). For organizations that do provide manual human translation, this attribute can be a boon to them as well, allowing them to understand pieces of content that do not need to be processed, saving time, effort and cost to everyone in the translation workflow.

As developers it's our responsibility to make sure it is used correctly, most likely by helping to train content authors.

Tuesday, July 20, 2010

W3C Cheat Sheet Now Includes HTML5

W3C Cheatsheet screen shot Back in November, the W3C released a handy tool aimed at helping developers quickly access information from various W3C specifications (W3C Cheatsheet for developers). The features were pretty straightforward:

This cheatsheet aims at providing in a very compact and mobile-friendly format a compilation of useful knowledge extracted from W3C specifications — at this time, CSS, HTML, SVG and XPath —, completed by summaries of guidelines developed at W3C, in particular the WCAG2 accessibility guidelines, the Mobile Web Best Practices, and a number of internationalization tips.

The author has had many requests to add information about all the new, changed, obsolete and removed elements and attributes in HTML5 by highlighting them (and including the new ones) throughout the application. Today he has released the updated version of the application (HTML5 in W3C Cheatsheet) with those changes.

Now when you look up an HTML element or attribute, the results will indicate if the element is different in some way in HTML5 than it has been in the past. Optimized for mobile use, the slim interface makes for a pretty lightweight quick reference tool should you have any questions and find the W3C specs a little heavy to read through.

The cheat sheet also includes sections for mobile web best practices, accessibility (WCAG 2.0), internationalization tips, and even a small section covering common character codes. Most of the content includes links to the relevant W3C specification for further reading. All the data comes from HTML: The Markup Language Reference. Try it out and see what you think:

W3C Cheatsheet for Web Developers

Tuesday, July 13, 2010

Methods to Select an HTML5 Element

Sectioning Elements

Right at the end of June, the HTML5 Doctor web site celebrated its first birthday (Happy 1st Birthday us). As part of that birthday celebration they have given us a gift: The Amazing HTML5 Doctor Easily Confused HTML Element Flowchart of Enlightenment (320kb PDF).

Inspired by an original version sent in by Piotr (one of our readers) and developed by Oli the chart helps guide you through those tricky differences between header, footer, aside, section, article, figure and yes, div. It's available in either pdf or png format.

Given the ingoing confusion over some of the new elements (both by the general public and seemingly within WHATWG itself), this chart should make it much easier to at least struggle through these decisions. As someone who expects everyone on his team to be able to justify every element on the page, I expect that if HTML5 ever wraps up and/or we start to utilize it on client projects, this will prove to be very helpful to us.

Inline Elements

W3C The chart may only deal with sectioning elements, but there are many more elements to consider. Some elements that I feel should have been deprecated have been re-cast in HTML5 with new purpose, and this may cause yet more confusion. In particular, the b and i elements now require a primer from the W3C to make sense to the seasoned developer: Using b and i elements

W3C provides the following background:

The HTML5 specification redefines b and i elements to have some semantic function, rather than being purely presentational. However, the simple fact that the tag names are 'b' for bold and 'i' for italic means that people are likely to continue using them as a quick presentational fix.

The W3C then provides an answer, a tiny chunk of which says this:

You should not use b and i tags if there is a more descriptive and relevant tag available. If you do use them, it is usually better to add class attributes that describe the intended meaning of the markup, so that you can distinguish one use from another.

The article goes on to describe challenges with internationalization:

Just because an English document may use italicisation for emphasis, document titles and idiomatic phrases in a foreign language, it doesn't hold that a Japanese translation of the document will use a single presentational convention for all three types of content. Japanese authors may want to avoid both italicization and bolding, since their characters are too complicated to look good in small sizes with these effects.

And yet, after all this, they still provide a recommended usage:

In the HTML5 specification 4.6 Text-level semantics lists other elements that can be used to describe inline text semantically, such as dfn, cite, var, samp, kbd, etc. [...] It may help to think of b or i elements as essentially a span element with an automatic fallback styling. Just like a span element, these elements usually benefit from class names if they are to be useful.

Since I don't allow use of b or i in our HTML4 documents (because they impart no structural or semantic meaning, only visual style), I don't see any reason to use them even given their revised role in the HTML5 spec. This W3C article has more caveats to their use than solid reasons, and retraining staff to support an element that is still mired in confusion isn't the right way to go.

We don't need a chart to say "No" for this one.

Friday, October 2, 2009

Facebook and Google Want to Translate Your Site

Translations for Facebook Connect

Earlier this week Facebook announced a new service built on the Facebook Connect API called Translations for Facebook Connect. In general, the idea behind this tool is to allow developers the ability to translate a web site (into a language currently supported by Facebook, 65+ right now,) by crowdsourcing the translation itself. Leaning on their own experience letting users translate the Facebook user interface, Facebook is essentially opening its translation workflow process to the world.

From the How It Works portion of the announcement:

After you choose what languages you want your site or application to support, you can get help from the Facebook community to translate your site, as we did, or you can do the translation yourself, or make a specific person the administrator of the process. [...] Once you register content for translation, your connected Facebook users can start translating your sites' content just as users helped translate Facebook.

Facebook has also created a new client-side featureset, using their XFBML framework, to allow developers to automatically submit content wrapped in a fb:intl tag for translation and to allow translators to translate the content inline. Facebook has put together a simple demo to show this in action.

The Facebook Developer Wiki article on internationalization outlines the process from start to finish. Here you can see that the process isn't just one of machine translation, but that a site owner must designate content strings (pages, words, phrases, UI elements, images, etc.) that are to be translated into a language selected by the site owner. The article also provides samples of how a translator might perform bulk translation or inline translation with screenshots showing orange underlines and menus that appear on a right-click to perform the translations.

The most important part of this tool, however, isn't the technology but instead a fundamental understanding of the process of translation. Facebook has a pretty good article on Internationalization Best Practices that at least presents a primer to those who have never been through this process, gleaned from Facebook's own experience translating its site. This addresses, albeit without a lot of detail, some of the pitfalls those of us in the world of localization have experienced, such as the tendency for translated phrases to contain more characters than the original phrase (which can mess with a pixel-precise layout), or choosing general phrases to leverage throughout the site instead of translating similar but different bits of words over and over.

Caveats

Unlike machine translation, humans are performing the translations, greatly increasing the likelihood that they will understand context. However, there are many conditions that must be satisfied or assumptions that must be made to implement this set of tools...

Does the web site attract users who are fluent in another language (the desired language)? If the site isn't already in the native tongue of a user, why would he/she be there?

Are these users willing to help translate your site for free? Given the concept of "you get what you pay for," there has to be a consideration for the quality and skill of the translators who take it upon themselves to do this work.

Is the translator a subject matter expert? You may not want to have just anyone translate your site about widgets. Widgets may require some very technical language that a casual reader may not grasp and may not understand in context. Words that are mundane in daily use may have a very specific meaning in the world of widgets and might not suffer a loose translation well (or perhaps aren't even supposed to be translated).

Is the translator part of your target audience? It's one thing to understand Portugese because your are from Portugal. It's another thing entirely to translate into Portuguese for use by Brazilians. Understanding the region, dialect, and local idioms can be very important for a proper translation (which is why we call it localization).

Are the site owners comfortable letting unknown third parties translate their message? Tag lines, marketing lingo and other carefully crafted content usually doesn't translate easily. Even with an approval process in place, it may take many passes to get site owners confident with a brands translation in language they do not speak.

Does the site already have Facebook Connect enabled? If not, then time must be taken to add the appropriate code to each page of the site to be translated.

Is time budgeted to identify content for translation? Somebody has to walk through the entire site (or at least any pages to be translated) and mark up the content for translation using what may be arcane XFBML tags.

Does the user even allow Facebook Connect? If your ideal translator doesn't want to log in to Facebook Connect on your site, then all this is moot. The user/translator has to trust both Facebook and your site, and feel altruistic enough to want to help.

Google Web Site Translator Gadget

The day after Facebook announced its new translation tool, Google reminded us that it has been doing this for a while by announcing its Web Site Translator Gadget. The Google widget works by providing a web site developer with a block of HTML and JavaScript to drop on the page of an existing site. This code draws the menu that allows the site to be translated. You can grab the code at the Google Translate Tools page. It currently supports 51 languages.

The Google Translator Toolkit powers this machine translation tool, and while Google states that the toolkit learns from corrections to translations provided by users, it's still just machine translation without human intervention. It's also far easier to implement on a site and doesn't require much overhead. As an aid to the user, when the mouse hovers over the translated content, the block of copy is highlighted and the original content is displayed using a "tooltip" (or the title attribute of the element).

The Google Translate human intervention cycle

Because Google Translate has been around for so long, many users are accustomed to the quirks and know better than to rely on it to translate marketing content or poetic prose. Where it excels is in quickly allowing a site visitor to get the gist of a page, understand an address, or otherwise grab content that isn't so intertwined with a need for context that a user can glean some use.

Please understand that I am not referring to the "Translate" feature that was added to the Google Toolbar for Firefox as of yesterday (October 1). That is a client-side function that operates independently of the web site owner and I am only addressing translation features that a web site owner might want to implement to translate his/her site.

Caveats

Machine translation is a risky endeavor. Even the most robust natural language processors have trouble with elements of language that humans understand naturally, such as context, ambiguity, syntactic irregularity, multiple meanings of words, etc. An example often used in this discussion is evident in these sentences:

Time flies like an arrow.
Fruit flies like an apple.

The words "flies" and "like" have completely different meaning between sentences. Only our knowledge of the metaphor and the bug allow us to understand the differences without further context. As my brand manager for me, I took the introductory paragraph from my web site (excuse my ego, in case you hadn't noticed it already):

I am a founder, partner, and Senior Usability Engineer at Algonquin Studios, responsible for bridging the gap between the worlds of design and technology. With experience in both, I bring a unique perspective to projects, allowing both design and implementation to merge seamlessly.

I translated it into German and then back to English:

I am a founder, partner and senior usability engineer at AlgonquinStudios, responsible for bridging the gap between the world of design and technology. With experience in both, I bring a unique perspective on projects, up so that both design and implementation to proceed smoothly integrated.

We can see it does pretty well for the first half of the paragraph and then it falls apart. This experiment is inherently unfair, of course, but it does demonstrate some of the risks with machine translation. It becomes particularly problematic when proper names that correspond to common words are translated.

Which to Use?

Machine translation is a risky proposition at best. You have no control over what content will come back to your end user. Adding a Google Translate feature to your site can give the appearance to some users that you have effectively signed off on the content. If that's not a concern, perhaps because of the nature of your site (fan site, personal site, etc.) then it's certainly your cheapest option. If you have a business site then this may not be the right path for you. You may still want to link to the Google Translate page to let end users perform translations on their own. At that point you have set an expectation that this is not a serice you provide and you are not responsible for how bizarrely an idiom may be translated.

Crowdsourcing your translation may sound like a better idea simply because you now have humans making decisions about what makes sense in the translation, but you have to take into account the caveats above. Are you prepared to pay for (or spend) the time preparing a site for translation and then babysitting the process, hoping the entire time someone with enough linguistic skill will come along and translate it?

This is where I would lean on a third option. Hire professionals who have done this before. If you are talking about translating your corporate brand and message, then you shouldn't leave it to machines or the waiting game of altruism. If you just want to translate a personal site, a fan site, or perhaps a strong community site (like Facebook), then either other option may work well for you depending on what you can commit. It may even be worth considering Google for the short term option while waiting for the Facebook option to pay off.

Pages

Wednesday, November 13, 2013

Captions in Everyday Use

Update: November 14, 2013

Monday, May 7, 2012

New Crowdsourced Translation Option

Tuesday, January 17, 2012

HTML5 Will Play Nice with Translation

Tuesday, July 20, 2010

W3C Cheat Sheet Now Includes HTML5

Tuesday, July 13, 2010

Methods to Select an HTML5 Element

Sectioning Elements

Inline Elements

Friday, October 2, 2009

Facebook and Google Want to Translate Your Site

Translations for Facebook Connect

Caveats

Google Web Site Translator Gadget

Caveats

Which to Use?