Showing posts with label xhtml. Show all posts
Showing posts with label xhtml. Show all posts

Sunday, January 11, 2015

On Use of the Lang Attribute

HTML5 Logo with character for Chinese number 5.

Way back in October I noticed this WHATWG HTML bug (26942) where someone asked why do these examples of <html> lack the lang attribute? I thought the answer from Hixie was a bit dismissive and not based on any data or real-world benefits of use, particularly in the context of screen readers:

Why not? Realistically, few people include it. It just means the language is unknown.

At the time, I could not get the latest archive to download from WebDevData.org (though that has changed, see below), so I fell back to asking for help on why the lang attribute is valuable.

How the lang Attribute on <html> Is Used

I got lots of good bits of feedback, which I collected into a Storify. I've distilled all that great information to these key points:

  • VoiceOver on iOS uses the attribute to auto-switche voices.
  • VoiceOver can speak a particular language using a different accent when specified.
  • Leaving out the lang attribute may require the user to manually switch to the correct language for proper pronunciation.
  • JAWS uses it to load the correct phonetic engine / phonologic dictionary — Handy for sites with multiple languages.
  • NVDA (Windows) uses it in the same way as VoiceOver and JAWS.
  • When used in HTML that is used to form an ePub or Apple iBooks document, it affects how VoiceOver will read the book.
  • Firefox, IE10, and Safari (as of a year ago) only support CSS hyphens: auto when the lang attribute is set (not from Twitter; source).

In the absence of setting a lang attribute on the <html> element, screen readers will fall back to the user's default system setting (barring any custom overrides) when speaking content.

How Many Pages Use lang

On January 8, WebDevData.org (from a W3C Community Group) posted its latest archive (which did not error on download, woo!). It consists of the HTML from 87,000 web pages.

I pulled down the 780MB file and re-taught myself the skills necessary to parse the files. For those who are regular expression geniuses, you are welcome to suggest an alternate approach, but I used the following pattern to return all the <html> elements: <html([^>]+)>. It fails for any <html> with no attributes at all, but for what I am doing that's ok.

Of the 84,054 pages I parsed (I excluded XML, ISO files, and so on), I found that 39,433 use the lang attribute on the <html> element. That's just about 47% (46.914% if I understand significant digits correctly).

What that tells me is that instead of the case being that few people include it, nearly half the web includes it.

There are 12,672 instances of xml:lang, though at a quick scan they appear alongside lang. If anyone with better regex skills would like to help me further parse, please let me know.

Why You Should Use the lang Attribute on the <html> Element

Hyphens

By using lang, you get the benefits of hyphen support in your (modern) browser that you otherwise would not get (assuming you use hyphens: auto in your CSS).

Accessibility

At the very least, lang is a benefit for screen reader users, particularly when your users don't have the same primary language as your site. It allows proper pronunciation and inflection when the page is spoken.

WCAG Compliance

Including the lang is a Level A requirement of the Web Content Accessibility Guidelines 2.0 (specifically item 3.1.1 Language of Page). Technique H57 identifies the lang attribute specifically.

Internationalization

The W3C Internationalization (I18n) Activity has a great Q&A on why you should use lang, which was updated less than two months ago. I'll reprint the start of the answer, but there is far more detail and I strongly recommend you go read it.

Identifying the language of your content allows you to automatically do a number of things, from changing the look and behavior of a page, to extracting information, to changing the way that an application works. Some of language applications work at the level of the document as a whole, some work on appropriately labeled document fragments.

We list here a few of the ways that language information is useful at the moment, however, as specifications and browsers evolve in the future there could be numerous additional applications for language information.

Interesting Aside

If you go to the WHATWG HTML5 specification today and view the page source, you'll see the following language declaration in the code:

<html class=split data-revision="$Revision: 8877 $" lang=en-GB-x-hixie>

Not to be outdone, the W3C HTML5 spec has the same language declaration.

If anybody has the en-GB-x-hixie phonologic dictionary in his or her screen reader, I'd love to hear it.

While technically allowed (the -x puts it in the private use sub-tag category), it's bad form:

Private-use subtags do not appear in the subtag registry, and are chosen and maintained by private agreement amongst parties.

Because these subtags are only meaningful within private agreements and cannot be used interoperably across the Web, they should be used with great care, and avoided whenever possible.

Update: January 1, 2015

For what it's worth, I've filed bugs against the W3C HTML5 spec and the WHATWG HTML5 spec.

Update: February 25, 2015

Another case where a lang attribute is important, though in this case on a specific element, is outlined in the piece HTML5 number inputs – Comma and period as decimal marks:

<input type="number"> will open a numeric software keyboard on modern mobile operating systems. Not every user can input decimal numbers into this convenient field without proper localization.

[…]

Half the world uses a comma and the other half uses a period as their decimal mark. (In Latin scripts.) Does your web application take that into consideration? Do the browsers?

Monday, January 17, 2011

W3C and WHATWG Provide HTML5 Updates

W3C

HTML5, CSS3The W3C is pretty good about posting news when new HTML/CSS-related documents undergo updates, status changes, or generally move forward. On Friday the W3C HTML Working Group announced the publication of eight new documents. The brief release provides an even briefer overview of each, or you can see the same list, with the ability to file a bug, at the HTML Working Group home page. The eight documents (all dated January 13, 2011):

  1. Working Draft of the HTML5 specification.
  2. An updated draft of the helpful HTML5 differences from HTML4.
  3. The accompanying non-normative HTML: The Markup Language Reference, which is worth reading through if only to look at the items marked changed.
  4. HTML+RDFa 1.1, which outlines support for RDFa in both HTML4 and HTML5.
  5. HTML Microdata, discussing support for machine-readable data in HTML, ideally in an easty-to-write manner.
  6. HTML Canvas 2D Context, which, I think obviously, defines the API for use with the canvas element.
  7. HTML5: Techniques for providing useful text alternatives, a handy document for how to use the alt attribute and other related features (except longdesc).
  8. Polyglot Markup: HTML-Compatible XHTML Documents, intended to offer developers guidelines for creating HTML documents that validate as both HTML and XML. It's worth a read just to understand the premise.

For those of you more interested in the progress of accessibility in HTML, you can always drop in to see the weekly W3C Accessibility Task Force (a11ytf) bugs which are just hanging out, waiting to be closed, verified, or sent along to the HTML Tracker. Laura Carlson (of the [webdev] Web Design Update newsletter) sends out a weekly email update with a link to the report, Pre-Last Call "A11ytf" Keyword Bugs Awaiting Task Force Action (these links are both from January 15, 2011).

WHATWG

WHATWG is trying something new — a summary. As the author writes, If this works out you might see another one. There is a chance, then, that you might not see another. At least the document posted on January 16 gives us some updates for now. Read it at Base64, model trains, Web Workers & the DOM, captions, …

Unlike the W3C status update, this one is more casual. It links to emails flying around between the members, allowing you as the user to dive right into a thread and follow it along. It is not scrubbed or converted to non-technical language, so it provides a nice behind the scenes look. For example, there is a link to an email, [whatwg] Google Feedback on the HTML5 media a11y specifications, that provides feedback from Google on the HTML5 media accessibility specifications. As the folks who run YouTube, Google's feedback is important to the success of video on the web, and so Google's rep discusses things like the track element and the caption text file format. Don't expect to see any references to codecs (like the Chrome/H.264 dust-up).

Bonus

Since reading this may have gotten you all excited over the specs and all things related to it, take a few minutes to read this post about a "Shadow Dom:" What the Heck is Shadow DOM? If you play at all with JavaScript (libraries or otherwise), you may be interested to read up on examples such as the new slider input element and how you can access the element within the input — but not via standard HTML/CSS selectors, only via script. After all, if you are implementing it, you will need script to make it do something, so leaving that to the client-side script makes sense as a developer.

There, now you have enough to keep you busy on what might otherwise be a slow Monday at work.

Wednesday, September 1, 2010

Google, Arcade Fire Confused on HTML5

Screen shot of windows from the video

In case you haven't seen the Arcade Fire video, The Wilderness Downtown, you should take a look at it. Google and Arcade Fire got together to show off what Google Chrome could do with all the new gee whiz technology out there, and if you listen to all the major tech media outlets, it's an awesome demonstration of the capabilities of HTML5.

Except it's not HTML5, at least not the main pages that drive the site.

Google announced the video effort on its blog on Monday with the post Arcade Fire meets HTML5. To quote Google:

The project was built with the latest web technologies and includes HTML5, Google Maps, an integrated drawing tool, as well as multiple browser windows that move around the screen. [...] "The Wilderness Downtown" was inspired by recent developments in modern browsers and was built with Google Chrome in mind. As such, it’s best experienced in Chrome or an up-to-date HTML5-compliant browser.

I loaded the site up in Google Chrome first, figuring I'd try it out in the best circumstances. As it played, I took some time to peek under the hood at the source code. The main page of the site, the one that starts the ball rolling, has this as its DTD:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

That is not HTML5. But this page was just the launcher, not the page that's the workhorse (the URL seen in each pop-up). That page is container.html. Here's what I found when viewing its source code:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

Now that is definitely not HTML5. I also checked the drawingtool.html page and found the same DTD. Yet the meta content for main page of the site even claims that it is HTML5:

<meta name="keywords" content="Arcade Fire, Chris Milk, Chrome, Chrome Experiment, HTML5, Javascript" />
<meta property="og:description" content="Check out Arcade Fire's new interactive HTML5 music experience, “The Wilderness Downtown”."/>

I went into this experimental site expecting to see very basic HTML5 with a great deal of CSS3 and script. Typically when someone talks about how awesome their HTML5 site is, it's not the HTML5 (unfinished spec), it's the CSS3 and scripting that they are really bragging about (I'm looking at you, Apple).

It is this kind of confusion between HTML5 and the completely distinct (though related) CSS3 specification and scripting that continues to confuse people who aren't in the code regularly. These are why developers are told to "build something in HTML5," and have to translate in their heads that the request isn't driving from a desire to use a certain specification, but instead that the request is motivated by the "gee whiz" factor that comes from the styles and interactivity.

If you think this confusion isn't an issue, that anybody with a little tech savvy could figure it out, the headlines on these posts certainly would indicate otherwise (I cherry-picked for tech-savvy outlets):

To be fair, I am certain the dynamically-generated pages use the HTML5 video element at least. This post isn't about the project itself — I think it's very cool and I am glad to see Google is pushing the envelope (even if it does work very slowly in Safari, which Apple touts as HTML5-compliant). I take issue with the lack of distinction between HTML5, CSS, script, and even poorly-coded (because they are not HTML5, but two other specs instead) launch pages. And if any of these other organizations carrying the story had done the tiniest bit of fact checking, they might have noted that as well.

Related (at this blog)

Tuesday, July 27, 2010

Unicorn Validator

The W3C has today announced its brand new validator, named Unicorn for reasons they do not explain. The new validator combines four other validators into one:

Unicorn combines a number of popular tools in a single, easy interface, including the Markup validator, CSS validator, mobileOk checker, and Feed validator, which remain available as individual services as well.

W3C is inviting developers to submit their own modules for the validator to continue to expand its capabilities. So far Unicorn has been translated (localized) into 21 languages, and is hoping users can contribute more.

Unicorn's CSS profile validation includes the ability to choose warning levels to report and to choose a medium (all, braille, handheld, print, screen, etc.). The "Custom Task" allows users to choose among (X)HTML, CSS, MobileOK, and RSS/Atom. With this custom option, users can also choose a level of CSS (Level 1, 2, 2.1, or 3) in addition to other profiles (SVG, mobile, TV, etc.) and then again from a CSS user medium.

The report page has a much nicer way to display all the issues than previous validators, allowing users to collapse an entire section and showing icons and numbers corresponding to the count of specific error types within a section (2 errors, 3 warnings, for example). Each message also shows the specific line number and (if appropriate) character number of the error (warning, info alert) along with the corresponding message. Each section and message even has an anchor on it so you can link directly to any item for sharing issues with a team. Now if W3C could add a background color to the page (white would be ideal), then I wouldn't have to squint to see the results.

Give Unicorn a try and see what you think.

Friday, July 23, 2010

Opera Rep Provides HTML5 Overview

HTML5, CSS3Patrick H. Lauke is the Web Evangelist at Opera Software and ran the Accessibility Task Force for the Web Standards Project (WaSP). Last week (July 13) he gave a talk to the Institutional Web Management Workshop on HTML5. He lead viewers on a general history of HTML5, through an overview of the specification, and then plans for the future.

Slides from his presentation have been posted, along with a video of the entire presentation. Sadly, watching just the slides means you miss out on his narrative, and watching just the video means you miss out on the slides. That's why I've embedded both here. Because I had to scale them to fit in this layout, you may want to see the originals of each and view them side by side (links below).

His slides include code samples and URLs to check some of them out on your own. Even if you can't get the video to work, spend some time perusing the slides (especially slide #4, from Bruce Lawson, which shows what HTML5 is not). If you can see the video, stick around for questions at the end.

The video is about 45 minutes and is peppered with lots of good insight. For example, he frames HTML5 as an extension of HTML4, providing more options and features. He points out that valid HTML4 or XHTML1 sites don't need to be re-coded, something that many web novices worry about and web jerks may try to trick clients into doing. He even discusses his own opinions on new elements and features, such as article and video path obfuscation (to keep people from bypassing YouTube-style overlay ads).

Source:

Tuesday, April 27, 2010

State of Web Dev Survey Results

Thumbnail of the results overview PDF.Scroll Magazine, John Allsopp and Web Directions conferences all got together and ran the State of Web Development 2010 survey to gather information from developers on what technologies, techniques, philosophies and practices they use. The survey results gather the answers to 50+ questions and present them in a few different ways. They provide the com­plete (anonymized) set of responses in CSV for­mat for download, a PDF info­graphic overview (see the big image above), just the results to all the ques­tions (often compared to 2008) or their own detailed analysis.

As you read through the results, you may notice that what web developers use as their browser and platform does not correspond to the bulk of users. For example, 51% of respondents use Mac OSX, and 54% use Firefox as their primary platform. This does not correspond to general users who range from 8% to 15% on Mac OSX and 16% to 45% on Firefox. Granted, all stats are relative to the site reporting them, but from my own experience with our clients, the numbers for web developers are not the same as for the general public.

I do find it odd that web developers tend to eschew the very browsers and platforms that the general public uses (such as IE on Windows) and reserve them only for testing. I'd feel better if these developers were as familiar with what users actually use on a day-to-day basis than only testing on them for projects. This lack of familiarity with how the web is experienced through the same lens as the user can be both jarring and cause developers to fail to take advantage of, or code around, browser features/issues that are known to regular users.

You may find that statements about HTML5 and CSS3 adoption really require more context than the answers can provide. The same is true for certain technologies and how they are applied (I may use CSS rounded corners on a personal project, for example, but not on a client site). Here is a quick overview of some of the results (emphasis theirs):

  • Few respon­dents use any form of Internet Explorer for their day to day web use, but IE8 is the num­ber one browser devel­op­ers test their sites in.
  • Google Chrome has jumped dra­mat­i­cally as the browser of choice for devel­op­ers, to rank 3rd, at 17% just behind Safari at 20%. Firefox remains the num­ber one choice by some way, but respon­dents were split between 3.5 and 3.6 at the time of our sur­vey. Firefox 3.6 was released only a week before the sur­vey began.
  • Over half of respon­dents now use Mac OS X as their pri­mary oper­at­ing system.
  • Nearly a third of respon­dents (up from 16%) use Mobile Safari, while Android use is at around 4%.
  • JQuery has become even more dom­i­nant, with nearly 80% of all respon­dents using the library, up from 63% last year.
  • Desktop-​​like appli­ca­tion frame­works, such as Cappuccino and SproutCore show lit­tle sign of wide­spread adop­tion by devel­op­ers. Perhaps the day of desktop-​​like web apps is yet to come, or per­haps devel­op­ers really aren't look­ing to build webapps which mimic the desktop.
  • More respon­dents (45%) than not (44%) use CSS3 and exper­i­men­tal CSS, up dra­mat­i­cally from last year (only 22% then were using CSS3 and nearly 70% not)
  • Last sur­vey, only 4% were using font link­ing using @font-face. This sur­vey that's climbed to 23%
  • HTML5 is now used to some extent by around 30% of respon­dents, up from under 10% last survey

Monday, November 9, 2009

Screen Reader User Survey Results

WebAIM is a non-profit organization within the Center for Persons with Disabilities at Utah State University that focuses on accessible web content and technologies. WebAIM conducted a survey of the preferences of screen reader users back in December 2008, gathering a lot of interesting data about how users utilize assistive technologies (you can see the results of that survey at the WebAIM site).

WebAIM conducted another survey in October to track preferences of screen reader users. They received 665 responses to the survey consisting of a mix of disabled (90%) and abled users (10%). It's not a truly scientific survey, but it provides some valuable insight into usage patterns and user expectations.

I've just posted an article, WebAIM Screen Reader User Survey Results outlining the results of the survey. A couple excerpts:

Mobile

Pie chart of mobile screen reader use.

Most surprising to me was that 53% of those with disabilities claim they use a screen reader on a mobile device. More proficient screen reader users were more likely to use a mobile screen reader. If developers already struggle with building sites for mobile devices or struggle with building sites to be accessible, this can seem like a difficult challenge for many. The survey doesn't gather other information on mobile use, perhaps because they were surprised by its prevalence as well.

Finding Information

Users were asked how they go about finding information on a lengthy web page. 50.8% of users indicated they they use the page headings to navigate (really bolstering the argument of using proper headings in your content). 22.9% use the "find" feature of the browser, 16.1% navigate the links on the page, and 10.1% read through the page (and are apparently far more patient than I).

Go read the rest of the article. Now. Go.

Sunday, August 30, 2009

Contribute to HTML5

Listen, if I'm going to start a blog on web development some 15 years after I actually started web development, I really need to accept that all the furor of debating HTML has long since passed me by.

But wait - I am proven wrong!

In case you are a web developer living under a rock, you might not know about the recent demise of XHTML2 and the discussions of HTML5. You might not care about this post.

If you are, however, dedicated to code purity and following the specs, then you really should take some time to help review the HTML5 spec and provide your feedback. Go to the WHATWG blog post that asks for help. Go. Now. Last call is in October. Hurry.