Posts tagged with 'html'

XHTML not dead, despite reports

With the W3C's recent announcement that work on XHTML 2.0 is not being continued, it would be tempting to think that the HTML vs XHTML war has been 'won', and not by the side a lot of people wanted.

However, that's a misconception. XHTML is alive and well as part of HTML5, on more or less equal terms with 'plain' HTML. It's just not going to be replacing 'tag soup' any time soon unless people start using it!

I'll be taking a look at the options authors have for producing XHTML markup, but lets first look at why someone might want to use XML.

Pros and cons of XHTML

Make all your sites work in IE8 with one fell swoop

At work we have around 100 sites hosted for clients, some of which might not have been updated in a few years (I should point out these are sites we develop, so there's no chance a client's going to edit the site themselves). IE8 is going to be rolled out to Windows users with Automatic Updates enabled as of next week, so there's a small worry about auditing these sites in time.

When IE7 came out we had to spend the time going through each of them manually and checking everything was fine. This time around, although IE8 has a new rendering model, it's possible for the browser to render pages as if it was IE7. In general this has been hugely controversial, but for people in our situation it's pretty handy.

The easiest solution to having sites that may not work in the IE8 renderer is 'do nothing'. IE8 has a compatibility button that a user can press that renders the page as if it's IE7. If enough users press this button, a scary centralised Microsoft database marks you as a naughty site and from then on, IE8 users get to see you in 'compatibility mode' until some time in the future when you fix your site and manage to persuade Microsoft that you should be let back in to the halls of the worthy.

However that sounds like a mess, relies on users jumping through some hoops, and might be a bit tricky to get off the list at a later date. The strategy we've decided to go for is to explicitly mark all our sites as needing to be rendered in compatibility view, then turn this off for each site in turn as they're audited, at our leisure.

Rev-canonical should be handled with care

A short while back, Google and a bunch of other search engines launched @rel="canonical", a standard for specifying that the current page is a copy of another, more canonical, version hosted elsewhere. I blogged about it at the time , and generally approved of the idea but warned against overuse when an HTTP redirect might be more sensible.

Recently there's been a large amount of discussion about @rev="canonical" , a proposal that seems to have been floated with the intention of providing URL shortening services. The idea is that my page can 'advertise' some other URLs that it can be found at so that clients can pick a different one to use when referring to it.

In this particular use case I could publish a page at http://ciaranmcnulty.com/blog/2009-04-14/a-long-blog-post-with-a-complex-url that had a @rel="canonical" link to http://ciaran.ws/complex (I don't really have that domain, don't bother trying it). Applications that wanted a shorter URL for the content (e.g. Twitter clients, SMS gateways) could then use my shorter URL rather than having to get a more obfuscated one from TinyURL or somewhere similar.

The number of sites that have already included the markup is staggering in such a short time, and a testament to how a simple markup idea like this can really take off (if only Microformats could gain this kind of uptake!). I've been reading a lot of the commentary that's bouncing around the HTML blogosphere, and thought I'd put my £0.02 in. Frankly, I fail to see the point of all the hooh-hah, for the following reasons:

Rel-canonical should be handled with care

Something we've been telling clients for years is to not publish the same information in more than one place. There are many reasons for this from the point of view of web semantics, but the one that makes the clients listen is when we say that Google will penalise their site for it.

As of today Google allow duplicate content as long as you indicate clearly which version is the canonical one. This entails adding something like the following to the HEAD element in your duplicated page, pointing back to the original:

<link rel="canonical" href="/the-other-page" />

This approach has been welcomed by many, but I'm fearful that it is duplicating already-existing web semantics as well as encouraging bad habits in web authors.