A short while back, Google and a bunch of other search engines launched @rel="canonical", a standard for specifying that the current page is a copy of another, more canonical, version hosted elsewhere. I blogged about it at the time , and generally approved of the idea but warned against overuse when an HTTP redirect might be more sensible.
Recently there's been a large amount of discussion about @rev="canonical" , a proposal that seems to have been floated with the intention of providing URL shortening services. The idea is that my page can 'advertise' some other URLs that it can be found at so that clients can pick a different one to use when referring to it.
In this particular use case I could publish a page at http://ciaranmcnulty.com/blog/2009-04-14/a-long-blog-post-with-a-complex-url that had a @rel="canonical" link to http://ciaran.ws/complex (I don't really have that domain, don't bother trying it). Applications that wanted a shorter URL for the content (e.g. Twitter clients, SMS gateways) could then use my shorter URL rather than having to get a more obfuscated one from TinyURL or somewhere similar.
The number of sites that have already included the markup is staggering in such a short time, and a testament to how a simple markup idea like this can really take off (if only Microformats could gain this kind of uptake!). I've been reading a lot of the commentary that's bouncing around the HTML blogosphere, and thought I'd put my £0.02 in. Frankly, I fail to see the point of all the hooh-hah, for the following reasons:
Something we've been telling clients for years is to not publish the same information in more than one place. There are many reasons for this from the point of view of web semantics, but the one that makes the clients listen is when we say that Google will penalise their site for it.
As of today Google allow duplicate content as long as you indicate clearly which version is the canonical one. This entails adding something like the following to the HEAD element in your duplicated page, pointing back to the original:
<link rel="canonical" href="/the-other-page" />
This approach has been welcomed by many, but I'm fearful that it is duplicating already-existing web semantics as well as encouraging bad habits in web authors.
There's a whole class of 'Web 2.0' technologies that have emerged recently which have some common features: They solve a simple problem, they do so in a decentralised way and they stay simple. As examples I'd quote things like XFN, OpenID, oAuth and even things like RSS and Atom feeds. They start off by solving a particular use case, and stay as simple as possible (or at least should - I'm looking at you OpenID).
The latest such technology to interest me is oEmbed, via a blog post by Ben Ward. The name is a bit cryptic, but the use case it addresses is one of embedding content from one site into another. That may sound like something esoteric, but just looking back over the handful of blog posts I've done on this very site, a large number of them contain images from Flickr. Looking around the web as a whole people are constantly embedding videos and images from sites all around the web into their forums, blog posts and CMSes.
There are a couple of ways this is normally done in the wild, neither of which are that satisfactory.
- The site the content is hosted on generates a snippet of HTML - From looking at a page with the content on, a couple of clicks will give the user some HTML that they can copy and paste into their HTML editor. This is ok for people who are happy with HTML and actually have the ability to edit the HTML in their posts rather than using some sort of WYSIWYG, but can be confusing for novice users. This technique also limits the ability of the receiving site to reformat the content to fit into any existing templating.
- The site the content is hosted on gets screen-scraped - Some blogging platforms and CMSes know how major sites like Flickr or YouTube structure their HTML so are able to extract images and videos from just a URL. This of course falls down if the HTML changes significantly, and if you're trying to post content from a site your platform doesn't know about, you're out of luck.
Of the two existing solutions, the second has the best user story. The user clicks an button, pastes in a URL to the content on another site, and the patform slurps up the content, reformats it to fit in with any house styles and inserts it into the content area. What's needed is a way to do this in a decentralised way, which is where oEmbed comes in.
How oEmbed works
I've been doing a bit more Javascript recently, specifically using Prototype AJAX stuff with Google Maps ,and I've come up with a few guiding principles that have helped me keep stuff neat and tidy. I thought I'd share them with you, my handful of readers.
SCRIPT tags should live inside HEAD
There are very few reasons for having SCRIPT tags inside the document body. The main one, use of document.write() nearly always leads to ugly code. It's also not actually allowed in XHTML, despite most browsers accepting it as long as the page is delivered as text/html.
Furthermore, SCRIPT in the document body is in my opinion always the result of a perceived problem that's actually the result of poor architectural choices.
One of the emergent web technologies I'm very interested in is the Microformats project, a set of ways of making data embedded in HTML documents machie-readable.
Two of the most widely adopted Microformats are hCard and XFN.
hCard is a standardised method for marking up contact data. the point of it is that if all sites mark up contact data the same way, it's easier to parse.
XFN is all about the relationships between sites, and one of its key features is that it allows you to identify that a set of online profiles all belong to the same person (if they've voluntarily linked them).