Monday, October 31, 2011

HTML5 kills [time], Resurrects [u]

HTML5 logo -- I am the 'alt,' not the 'title'The HTML5 specification as managed by both W3C and WHATWG is an unfinished, incomplete specification that can change at any time. That isn't a criticism, it's just a statement of fact. It's a fact often ignored by people and companies who choose to implement it and then cry foul when something changes.

<i> and <b>

So far HTML5 has done things like convert the deprecated (in HTML 4) <i> and <b> elements to impart stylistic meaning with specific results. While the stylistic effect was known, this is why I stopped using them years ago, instead imparting semantic meaning by using <strong> and <em> and style via CSS. HTML5 Doctor covers this in more detail in The i, b, em, & strong elements. This caused me much consternation.

<small> and <hr>

Then they restored the <small> and <hr> elements by giving them semantic and structural meaning, repurposing yet more mis-used elements for reasons different than how people were already mis-using them. Once again, HTML5 Doctor addressed this in the post The small & hr elements.

The <img> altAttribute

Then in what I consider to be a profoundly anti-accessible move that can only make it easier for CMS vendors to cut corners, the requirement for an alt attribute on every <img> element was dropped under specific conditions. I know I wasn't alone on feeling this was a terrible decision when my two posts on this topic were widely distributed and still see a good deal of traffic six months later: Image alt Attributes Not Always Required in HTML5 and More on Image alt Requirement in HTML5.

And Now <u> Returns

I suppose, then, that I should not have been surprised when word came down that the <u> element was back. This is an element we blocked in our CMS, telling our clients that an underline meant text was clickable and there was no good reason to use it in regular copy. Even so, I still had a use case that I thought made sense — to indicate what character for a form field is accessible with the keyboard via the accesskey attribute. So of course the real reason for its resurrection surprised me. Let me reprint the entire entry for <u> in the WHATWG HTML5 specification:

4.6.18 The u element

The u element represents a span of text with an unarticulated, though explicitly rendered, non-textual annotation, such as labeling the text as being a proper name in Chinese text (a Chinese proper name mark), or labeling the text as being misspelt.

In most cases, another element is likely to be more appropriate: for marking stress emphasis, the em element should be used; for marking key words or phrases either the b element or the mark element should be used, depending on the context; for marking book titles, the cite element should be used; for labeling text with explicit textual annotations, the ruby element should be used; for labeling ship names in Western texts, the i element should be used.

The default rendering of the u element in visual presentations clashes with the conventional rendering of hyperlinks (underlining). Authors are encouraged to avoid using the u element where it could be confused for a hyperlink.

Proper names in Chinese and misspelled words. The u has returned and we get little detail for how to use it but plenty of detail on how not to use it.

Once again HTML 5 Doctor came to the rescue and provided explanations and use cases for the zombie <u> element in the post The return of the u element. To distill it here, some examples:

  • Chinese proper name marks, such as the names of people, places, dynasties, or organizations. It's akin to capital letter in English. Eg: 屈原放逐,乃賦離騒左丘失明,厥有國語
  • For indicating family names in Asian languages. Given that the family name often precedes the given name, this can be confusing to Western readers. This is an alternative to the capitalization we often see when translated to English. Eg: Lynn Minmay versus LYNN Minmay.
  • To indicate potential spelling errors, like you might see in MS Word or word processors and browsers with spell checkers. Eg: This is not my bootiful house.

In each of these cases you could use CSS to modify the <u> display to put a wavy line under the text (for the first and third examples) or to shift the characters to uppercase (the second example). The argument here is that the <u> underline will still render if the CSS is lost.

The specification lists the browsers that support this element, showing Internet Explorer, Firefox, Opera and Webkit. These browsers support it from its prior incarnation, however, as they have been supporting the <u> element since its presence in prior versions of HTML. Whether they support this new re-imagining is a different story. Since this new explanation is still primarily about display, the browsers may support it's new meaning by accident. The wavy line CSS, however, is only in Firefox right now.

The End of <time>

Oh for f*cks sake. We may as well just 'consider replacing all HTML tags with <derp>'

Just found a bug in #html5! There is a <data> element, that basically does nothing. We got <div> and <span> for that.'

Over the weekend I saw a tweet-storm from people who are closely tied to the pulse of HTML5. The <time> element was slated for destruction and quite a lot of people who have come to rely on it immediately swung into action — partly to educate those of us who aren't so intimately familiar with the element. The #occupyHTML5 hashtag on Twitter is on fire with people deriding the decision, though I defer to two experts on the subject to give a little explanation.

Ian Devlin, author of the new book HTML5 Multimedia: Develop and Design, wrote up his reaction in "On the disappearance of HTML5 <time>." To quote:

Many documents and pages that get posted on the web have some sort of timestamp attached to them. Think of every news and blog article that’s written, this very post included, that indicate somewhere when it was posted. Having a machine readable element that encapsulates this special case piece of information is very useful for both machines and humans alike to read and understand.

He notes that the new <data> element, which is intended to replace <time> as a more generic element, isn't a bad idea in itself, but is also a little late to the game.

Bruce Lawson, one of the HTML5 Doctors and one author of Introducing HTML5 (along with his role as standards evangelist at Opera), has received quite a lot of support for his post Goodbye HTML5 <time>, hello <data>! Awful Cher video notwithstanding, he raises good points:

<time> (or its precursor, <date>) has an obvious semantic (easy to learn, easy to read). Because it's restricted to dates and times, the datetime attribute has a specific syntax that can be checked by a validator. Conversely, <data value=""> has no such built-in syntax, as it's for arbitrary lumps of data, so can't be machine validated. This will lead to more erroneous dates being published. Therefore, the reliability and thus the utility of the information being communicated in machine-readable format diminishes.

He goes on to point out that Reddit, The Boston Globe, the default Wordpress theme, and now parts of Drupal's core have built <time> into them.

A request to reverse the decision to drop <time> has already been filed. There is also a shared document for people to log their arguments for keeping <time>, ostensibly for taking back to Hixie to reconsider the decision.

The Takeaway

Given all the ongoing tweaks to the HTML5 specification by WHATWG, my opinion that dropping the version number and making it a living document with no set versions is flawed. As developers who want to be on the bleeding edge start to integrate elements and attributes from a specification that's not fully baked, this moving target just means more refactoring and ultimately cost or delays to the end users and/or clients.


Updated November 3, 2011

The <time> element has been restored. Read more: Well, It's about <time>

In case you had not read my piece following up this one with a broader look at the whole process around removing and restoring an element, go read End of <time> Is Not Helping the Case for HTML5

No comments:

Post a Comment