No more hashbang; zones, replacesid, and HTTP

Posted on2011-02-10 by Michael Gorman

There’s been some consternation in the Web community recently over websites’ increasingly common use of the hashbang (#!) URL construct as a means by which to reduce overhead when downloading a page. I couldn’t agree more; I hate it, too. The problem is, it’s useful. When sites change their URL structures this way, what they’re really doing is telling web browsers, “there’s really only one page on this site. Whatever comes after the hashbang is just words and pictures to fill in a template; the template is pretty much the same across the whole site. Don’t waste time downloading the template code more than once if you don’t have to.” This is a good idea for a number of reasons, including making things run faster for users and reducing data use on mobile devices (and, increasingly, wired computers) with data limits. But there are problems, too, not the least of which is how comparatively unreliable websites can become when they’re set up this way.

My idea

I’ve got an idea for how we might fix this problem in a completely backwards-compatible way. I’d like to propose three new HTML attributes:Â zone,Â zonecontent, and replacesid. The first two attributes could be applied to any block-level element, i.e.Â <div zone="profile" zonecontent="michaelcgorman">, which would be represented in the URL asÂ http://www.example.com/social/profile:michaelcgorman. This is a perfectly valid URL. The entire thing would be sent to the server by any current web browser, search engine, etc. on every request. But if a browser supports the zone and zonecontent attributes, subsequent requests to the server (within the same path, such as, in the example above, http://www.example.com/social/) could be made for just the content of the new zone. More specifically:

Google user clicks on a link to http://www.example.com/social/profile:michaelcgorman.
Web browser requests /social/profile:michaelcgorman from www.example.com.
Web server returns the whole page, as has been done since the Web was created.
Web browser renders the response as a whole page.
User clicks on a link to http://www.example.com/social/profile:johndoe.
Web browser requests /social/profile:johndoe from www.example.com and adds X-RETURN-ZONE: profile:johndoe to the HTTP request headers.
Web server sees the X-RETURN-ZONE request header, matches it to the /social/ path, renders the zone="profile" section of the page, adds an HTTP response header of X-RETURNED-ZONE: profile:johndoe, and returns what needs to be changed (without the rest of the page template). This may include elements with the replacesid attribute (preferably, though not necessarily, wrapped in a hidden div just in case).
Web browser sees the X-RETURNED-ZONE response header, which means that the server has honored its request for only returning the profile zone. The browser changes the address bar to the new URL, http://www.example.com/social/profile:johndoe, and replaces the existing content of the element with zone="profile" with the returned content. For each returned element with an attribute of replacesid="x", the browser finds any element elsewhere on the page with id="x" and replaces it with the new element.

Obviously, for this to work, there would need to be an in-depth review process by the web community (so far this has only been an idea in my head, as far as I know). Things would need to be figured out much more precisely, and browsers would probably need to start implementing it. The goal is to make a solution that is so completely backward-compatible that it’s too difficult for a website owner to unintentionally mess up in an effort to rid their site of problems like http://www.facebook.com/profile.php?id=123456789#!/group.php?id=987654321 rather than http://www.facebook.com/group.php?id=987654321 andÂ http://www.twitter.com/michaelcgorman being the same thing as http://www.twitter.com/#!/michaelcgorman.

Done simply, this could be a whole lot simpler to implement than the current hashbang nonsense. It would require no XHR hackery, no pushState regression testing (no Javascript support at all!), and no browser compatibility checks. It would fit well within the boundaries of the current HTTP spec and each page of content would have its own real, unique, URL. What am I missing? Is there any reason this can’t be done?

update: edited step 8 above to clarify what URL is shown in the browser’s address bar. Also, I answered a couple of questions about this on Hacker News.