Wrox Home  
Search
Professional Web 2.0 Programming
by Eric van der Vlist, Danny Ayers, Erik Bruchez, Joe Fawcett, Alessandro Vernet
November 2006, Paperback
US $39.99 Add To Cart


Excerpt from Professional Web 2.0 Programming

Future-Proofing Your URIs

by Erik Bruchez

This article looks at key aspects that you should address to maximize the lifespan of your URIs.

Technology Agnosticism

A future-proof URI is a URI likely to be usable for a very long time, even if you decide to completely reorganize your Web site or change the technology used to produce static and dynamic pages.

A general strategy to make URIs future proof may be summarized as "don't show the technology." If the past 15 years of the Web teach us anything, it is that new Web technologies come out almost every day and you cannot assume that if your Web site is written in Python today, it will still be the case in 2 years, let alone 10 or 20 years. In addition, the Web will continue for a long time to be a network of different servers running different operating systems, platforms, and languages, and there is no benefit for the purpose of interoperability in making those aspects visible in URIs.

By choosing a technology-agnostic URI, you ensure that you don't have to change the URI as the technology behind it evolves or as other Web developers take over the development of a site. In addition, there is benefit to not telling the world exactly what technology you are using for security reasons (although Web servers often send information about server-side modules installed right within HTTP responses).

Along these lines, here are a few things you should do to build future-proof URIs:

  • Don't include anything in the URI that reveals the programming language or Web platform used to produce your HTTP resource. For example, ban .pl, .php, .jsp, .asp, .aspx, cgi-bin, servlet, .do, and so on.
  • Don't include file extensions that reveal the media type of the requested document, such as .html, unless you also provide a media-type agnostic URI. This recommendation may sound strange, since right from the beginning Web servers have been serving HTML pages with .htm or .html extensions, but thanks to content negotiation, you really don't need to do this, and a single URI enables access to an HTML version of the resource but also to future formats you may want for that resource. Think about the growing use of XHTML: a single URI can serve HTML to browsers that do not support XHTML at all, and well-formed XHTML to those that do support it. It makes sense not to duplicate all of your URIs just because you want to serve these two formats. In the future, new formats will appear and you may want to serve them from the same URI. For example, you may want to serve XHTML 2.0 instead of XHTML 1.1 to browsers that implement it.
  • Don't give a hint as to whether your page is a static resource on your file system or generated dynamically: the way a resource is served may change over time.

Being technology-agnostic often requires a little more work upfront, as some technologies actually encourage bad practices (for example, ASP and JSP encourage visible .asp and .jsp extensions, a situation probably at least partially driven by marketing purposes), but the benefits are likely to be long lasting.

Hierarchies and Collections

As discussed in Chapter 7, "HTTP and URIs," in the book "Professional Web 2.0 Programming" (Wrox, 2006, ISBN: 0-470-08788-9), HTTP URIs can contain hierarchical path information. When defining a URI space, you have the option of leveraging that hierarchy or not. As an example of hierarchical URIs, consider implementing permalinks.

Suppose the main URI for your personal blog is http://example.org/blog/. Although that URI can be permanent, its content is by design meant to change and keep updating with your latest blog postings. In this context, the term permalink is used to denote a permanent link to an individual blog post.

The WordPress blogging software documentation describes several types of permalinks, from ugly to pretty. Perhaps it's in the eye of the beholder to determine which is more aesthetically pleasing:

http://example.org/blog/index.php?year=2006&month=8&day=7&post=123

or:

http://example.org/blog/archives/2006/08/07/Web20-thebook

Both of these URIs make sense to represent access to a specific blog entry published on August 7, 2006. Which is the best one depends on the exact use case. For example, the hierarchical solution becomes impractical if you have dozens of query parameters that identify the resource and if those parameters are not by nature hierarchical. In addition, URIs that are internal and not likely to be ever seen by humans can use query parameters with few drawbacks: nice-looking URIs matter mostly to humans.

In the example above, a publication date is naturally hierarchical, that is organized in collections (a month always belongs to a year, and a day always belongs to a month), and in the case of publications such as articles or blog entries, it is a natural primary way of accessing resources. It also has the benefit that a static version of the site, backed up by actual directories, could be built, while query parameters would make this harder to accomplish without using URI rewriting.

It is important to realize that while inspired by the organization of file systems, a URI hierarchy does not have to be backed by a concrete file system with files and directories. The hierarchy can be purely virtual; for example, it can be backed by a database.

In general, URIs, like file system path names, go from general to specific, and from containing to contained. From this perspective, you may want to choose a sufficiently general root path element in your URI structure. For example, most Flickr URIs starts with the path /photos, which leaves the hierarchy open for paths starting with /videos in the future. On the other hand, a site like del.icio.us leaves less room for expansion, as its structure uses the username as root path element, followed by tag names, for example http://del.icio.us/ebruchez/Web2.0.

In addition to the hierarchy, the pretty URI above uses the notion of a slug, that is, a short name given to the blog entry or article. Using a slug has the potential benefit of giving hints to a search engine, as well as being read-only user-friendly; that is, by looking at a list of URIs, for example in your browser's URL completion bar, you can rapidly identify a particular post. On the other hands, slugs tend to make URIs longer. You can of course implement access to a resource using both a slug and a short identifier and use redirection between the two (redirection is discussed in depth in Chapter 16, "Implementing and Maintaining Your URI Space," in the book Professional Web 2.0 Programming).