Wrox Home  
Search
Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO
by Cristian Darie, Jaimie Sirovich
September 2007, Paperback


Excerpt from Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO

Redirecting HTTP 301 Status Codes with ASP.NET and IIS

by Cristian Darie and Jaimie Sirovich

One of the perks of ASP.NET is that it abstracts away many low-level implementation details from the Web developer. It does such a great job, in fact, that one can typically build complex Web applications without understanding much at all about the protocol Web servers use to speak to the world, HTTP (HyperText Transport Protocol).

Though most of the time this ignorance is bliss, it is sometimes not so with regard to search engine optimization. Using the protocol improperly has the potential to wreak havoc for search engine rankings. On the other hand, knowing how to use it effectively can be of great help to the very same end.

HTTP status codes are a small but critical part of this protocol. They provide information regarding the state of an HTTP request. One may use them, for example, to indicate that the requested information should be retrieved from a different location henceforth. In modern search engines, doing so also may result in a transference of link equity to that new location. This example alone highlights the importance of knowing how to use these codes.

The HTTP Status Codes

Each time a user agent requests a URL from a Web site, the server replies with a set of HTTP headers; the requested content follows after them. Most users never see this part of the communication, however, because Web browsers do not normally display them.

If you've never seen how these headers look, it's time to get your feet wet. The easiest way to get started is to use a Web-based tool that does all of the work for you. One such tool is located at http://www.seoegghead.com/tools/view-http-headers.php.

Figure 1 shows the results of using this tool for http://www.cristiandarie.ro. The status code is highlighted in the figure.

A more convenient way to view these headers is by using a plugin for your browser. One plugin you can use with Firefox is LiveHTTPHeaders. For Internet Explorer you can use ieHTTPHeaders. Figure 2 shows LiveHTTPHeaders in action.

The part of the HTTP headers we're predominantly interested in for the purpose of this article is the line containing the status code of the request, as indicated in the figure. The most common status code is 200, which specifies the request was processed by our Web server successfully without any surprises, and that the content the user requested follows.


Figure 1


Figure 2

However, there are many other status codes you need to know about as a search engine marketer. The status code we'll consider in this article is the 301 redirection code.

Redirection Using 301

301 and 302 are the HTTP status codes used for redirection. These codes indicate that another request must be made in order to fulfill the HTTP request — the content is located elsewhere. When a Web page replies with either of these codes, it does not return any HTML content, but includes an additional Location: HTTP header that indicates another URL where the content is found.

Figure 3 shows an example of how redirects occur in practice. As you can see, when a redirect occurs, the URL that issues the redirect doesn't return any content, but indicates the new URL that should be referenced instead.


Figure 3

Note that in the case that the user agent is a search engine spider or a software application, there is not a user involved in the process, as shown in Figure 3. Search engines follow the same basic process to update SERPs when they encounter a redirect.

Redirections can be chained, meaning that one redirect can point to a page that, in turn, redirects again. However, multiple redirects should be avoided to the extent that it is possible. A maximum of five redirections was stipulated by an older version of RFC 2616, but that limit was later lifted. Regardless, it is wise to avoid chained redirects because they can slow down site spidering — spiders may only schedule the result of the redirection for spidering instead of immediately fetching it.

There are actually many redirection status codes in the HTTP standard. They are listed in Table 1.

Table 1
Status Code Description
300 Multiple choices
301 Moved permanently
302 Found
303 See other
304 Not modified
305 Use Proxy
307 Temporary Redirect

In practice only, the 301 and 302 status codes are used for redirection. Furthermore, because browsers are known to struggle with certain of the other status codes, it is probably wise to avoid the others, even if they seem more relevant or specific. It can only be assumed that search engines may also struggle with them, or at least that it is not entirely understood how they should be interpreted.

The 301 status code indicates that a resource has been permanently moved to the new location specified by the Location: header that follows. It indicates that the old URL is obsolete and should replace any references to the old URL with the indicated URL.

Let's take as an example a fictional page named http://www.example.com/OldPage.aspx, which returns this header:

HTTP/1.1 301 Moved Permanently
Date: Wed, 02 May 2007 09:50:39 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Location: http://www.example.com/NewPage.aspx
Content-Length: 0
Connection: close
Content-Type: text/html; charset=utf-8

When loading the page in Web browser, one will be automatically redirected to the new location specified by the Location header. After the redirection, the back button in your browser won't reference the initially requested page, as a result of the old page being permanently redirected.

The 301 status code also indicates to search engines that link equity from the previous URL should be credited to the new one. In theory, the new page will inherit the rankings of the original page. In practice, however, it may take some time for this to occur. It would be wise not to frivolously change URLs regardless, if this is a concern.

301 is arguably the most important status code when it comes to search engine optimization. This article is dedicated to working with this status code, and to exercises demonstrating its use. But Chapter 4, "Content Relocation and HHTP Status Codes," of the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO (Wrox, 2007, ISBN: 978-0-470-13147-3), also examines the 302, 404, and 500 status codes. These are important to understand as well.

TIP
Search engines never index a page that arrives with the 404 status code. This is a real problem. If a server returns a 404, a search engine will de-list your pages. If a search engine sees blank pages, or pages full of errors, it may do the same. This should be avoided at all costs.

Redirecting with ASP.NET and IIS

You can implement 301 redirects using ISAPI Rewriting modules, products like UrlRewriter.NET which is discussed in Chapter 3, "Provocative SE-Friendly URLs," of the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO, or from within your ASP.NET code by setting the appropriate header data.

When using ISAPI_Rewrite, redirecting is implemented similarly to URL rewriting, except that you specify a redirection status code as a parameter. The following rule does a 301 redirect to Catalog.aspx when the initial request is for Catalog.html:

# 301 Redirect Catalog.html to Catalog.aspx
RewriteRule ^/Catalog.html$ http://seoasp/Catalog.aspx [RP]

The [RP] option of the rewrite rule specifies that a permanent redirect (301) should be made. If you want to use a temporary redirect (302), [R] should be used instead.

NOTE
Note the specification requires you to specify absolute URLs in redirect instructions. For example, in this example we redirected to http://seoasp/Catalog.aspx, and not to /Catalog.aspx. Most clients will understand relative paths as well, but ideally you'd always specify absolute locations for redirection. Future versions of ISAPI_Rewrite are planned to add the missing parts of the URL automatically but version 2.11 does not. The redirection syntax may also be updated. Any changes that affect the theory will be documented at http://www.cristiandarie.ro/seo-asp/.

Except for the new [RP] option at the end, there's nothing new for you here — but that option represents an important difference! With or without the [R] or [RP] option, the visitor would end up seeing the content provided by Catalog.aspx. However, when redirection is used, the user's Web client actually makes two calls to the Web server. First it asks for Catalog.html; as a response, it gets a 301 (in case of [RP]) or a 302 (in case of [R]) redirect code in the HTTP header, indicating Catalog.aspx as the new location. Then the Web client requests Catalog.aspx, and informs the user that a new URL has been loaded by updating the URL displayed in the address bar.

Redirecting using UrlRewriter.NET is equally easy:

<redirect url="^/Catalog.html$" to="/Catalog.aspx" permanent="true" />

When the permanent attribute is false the redirect uses the 302 status code, otherwise a 301 redirect is made.

If you want to implement the redirect yourself, you need to manipulate the response headers using the Response object provided by your current HttpContext object. Here's how to 301 redirect Catalog.html to Catalog.aspx yourself:

if (context.Request.Path == "/Catalog.html")
{
  context.Response.Status = "301 Moved Permanently";
  context.Response.AddHeader("Location", "http://www.example.com/Catalog.aspx");
}


TIP
When just the Location header is mentioned without explicitly mentioning the status code, a 302 temporary redirect is implied. Keep this in mind.

In practice, when a site redesign involves changing URLs, the Webmaster should at least 301 redirect the most important URLs to their new counterparts. Otherwise, link equity of the old URLs will be lost.

In the remainder of this article, we analyze to the most common problem Web sites face today that can be solved using HTTP redirection -URL correction. In Chapter 4, "Content Relocation and HHTP Status Codes," of the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO, we also cover:

  • Dealing with multiple domain names
  • URL canonicalization and eliminating Default.aspx

URL Correction

The great advantage with our current keyword-rich URLs is that we aren't really relying on the product or category names to find their data, but rather only on their IDs, which are subtly inserted in the URLs. This works great because the text in the URL can change without disabling it.

One potential problem with these links, though, is that when the text for a product or category name changes, its link will automatically be changed as well. As you already know, this has the potential to generate duplicate content problems, and that's certainly not something that you want!

With our current site (available as part of the code download for the book Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO), there are an infinite number of variations that lead to the same content. Take these three different URLs, which currently generate the same content:

http://seoasp/Product.aspx?CategoryID=6&ProductID=31 
http://seoasp/Products/SEO-Toolbox-C6/Link-Juice-P31.html 
http://seoasp/Products/SEO-Toolbox-C6/New-Link-Juice-With-Vitamin-L-P31.html 

The solution we're proposing is to choose a standard version of the URL, and 301 redirects all the other URLs that refer to that content to the standard URL. This avoids duplicate content problems that can result from multiple URLs returning the same content. This process is also critical during a migration to keyword-rich URLs on a preexisting Web site, in order to preserve URL equity.

Note that rewriting engines can't help here, because they don't have access to your product database. You can use regular expressions to transform /SEO-Toolbox-C6/ into ?CategoryID=6, but you can't do the transformation the other way around unless you know the name of the category with the ID of 6. That's why redirecting dynamic URLs to keyword-rich URLs needs to be taken care of in your ASP.NET application.

The exercise that follows demonstrates how to do exactly that.

Implementing Automatic URL Correction

  1. Add a new C# class file named SeoData.cs to the App_Code folder in the http://seoasp/ project (available as part of the code download for the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO,), and type the following code in it. This class contains the IDs and names of a few products and categories, simulating a real product database. (We prefer to avoid using a real database to keep the exercise easier for you to follow.)
using System.Collections.Specialized;

/// <summary>
/// Represents a fictional database with products and categories
/// </summary>
public static class SeoData
{
  // objects that store products and categories data
  public static NameValueCollection Products = new NameValueCollection();
  public static NameValueCollection Categories = new NameValueCollection();
  // static constructor
  static SeoData()
  {
    // add sample products to the collection
    Products.Add("15", "Fortune Cookie");
    Products.Add("31", "Link Juice");
    Products.Add("42", "ASP.NET 2.0 E-Commerce Book");
    Products.Add("45", "Belt Sander");
    // and categories
    Categories.Add("2", "Bookstore");
    Categories.Add("6", "SEO Toolbox");
    Categories.Add("12", "Carpenter's Tools");
  }
}
  1. Create a new class file named UrlTools.cs in your App_Code folder, and type this code:
using System;
using System.Web;
using System.Configuration;
/// <summary>
/// Class provides support for URL manipulation and redirection
/// </summary>
public static class UrlTools
{
  // obtain the site domain from the configuration file
  static string siteDomain = ConfigurationManager.AppSettings["SiteDomain"];
  /* ensures the current page is being loaded through its standard URL;
   * 301 redirect to the standard URL if it doesn't */
  public static void CheckUrl()
  {
    HttpContext context = HttpContext.Current;
    HttpRequest request = HttpContext.Current.Request;
    // retrieve query string parameters
    string productId = request.QueryString["ProductID"];
    string categoryId = request.QueryString["CategoryID"];
    // fix category-product URLs
    if (productId != null && categoryId != null)
    {
      CheckCategoryProductUrl(categoryId, productId);
    }
  }
  // checks a category-product URL for compliancy
  // 301 redirects to proper URL, or returns 404 if necessary
  public static void CheckCategoryProductUrl(string categoryId, string productId)
  {
    // the current HttpContext
    HttpContext context = HttpContext.Current;
    // the URL requested by the visitor
    string requestedUrl = context.Request.ServerVariables["HTTP_X_REWRITE_URL"];
    // retrieve product and category names from fictional database
    string categoryName = SeoData.Categories[categoryId];
    string productName = SeoData.Products[productId];
    // if the category or the product doesn't exist in the database, return 404
    if (categoryName == null || productName == null)
    {
      Go404();
    }
    // obtain the standard version of the URL
    string standardUrl = LinkFactory.MakeCategoryProductUrl(categoryName, 
categoryId, productName, productId);
    // 301 redirect to the proper URL if necessary
    if (siteDomain + requestedUrl != standardUrl)
    {
      context.Response.Status = "301 Moved Permanently";
      context.Response.AddHeader("Location", standardUrl);
    }
  }
  // Load the 404 page
  public static void Go404()
  {
    HttpContext.Current.Server.Transfer("~/NotFound.aspx");
  }
}
  1. Make sure you have the SiteDomain value set in Web.config:
  <appSettings>
    <add key="SiteDomain" value="http://seoasp" />
  1. Open Global.asax and call UrlTools.CheckUrl() in Application_BeginRequest(). This way, UrlTools.CheckUrl() gets executed on every user request, and it has a chance to 301 redirect the request to another URL if necessary.
  void Application_BeginRequest(object sender, EventArgs e)
  {
    // ensures a standard URL is used, 301 redirect to it otherwise
    UrlTools.CheckUrl();
  }
  1. Now it's time to put the new code to the test. Note that it's assumed that you have in place one of the URL rewriting solutions presented in Chapter 3, "Provocative SE-Friendly URLs," of the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO, using either ISAPI_Rewrite (which is also covered in our article, "Provocative Search Engine Friendly URLs in ASP.NET") or UrlRewriter.NET — both will do equally fine. Load http://seoasp/Product.aspx?CategoryID=6&ProductID=15 in your Web browser, and see some magic happen to the URL! Figure 4 shows how this request was redirected to its standard, keyword-rich version of the URL.

Figure 4
  1. Now try loading the page of a product that doesn't exist in the "database," such as http://seoasp/Product.aspx?CategoryID=99&ProductID=22. In this case, as Figure 5 shows, you'd get the 404 page you created earlier in this chapter.

Figure 5

One place to verify that a page has been loaded using the standard URL would be the Page_Load() event handler of the page. However, we found that it is generally easier to do the URL verification in the Application_BeginRequest() handler of Global.asax, which is executed on every client request. We use this method as a central place to filter all incoming requests, and decide what to do with them:

  void Application_BeginRequest(object sender, EventArgs e)
  {
    // ensures a standard URL is used, 301 redirect to it otherwise
    UrlTools.CheckUrl();
  }

The UrlTools.CheckUrl() method has the mission to verify that visitor requested the page using the standard URL. If that is not the case, the request is 301 redirected to the standard URL that is supposed to deliver that content. Moreover, if the request is for a product ID or category ID that doesn't exist in our database, the 404 page is returned instead, indicating that the requested content doesn't exist. Let's see how this is implemented.

The UrlTools.CheckUrl() method starts by reading the product ID and category ID from the query string:

  public static void CheckUrl()
  {
    HttpContext context = HttpContext.Current;
    HttpRequest request = HttpContext.Current.Request;
    // retrieve query string parameters
    string productId = request.QueryString["ProductID"];
    string categoryId = request.QueryString["CategoryID"];

If both these IDs are present, we assume the visitor has requested a category-product page, so we call CheckCategoryProductUrl() for further URL checking:

    // fix category-product URLs
    if (productId != null && categoryId != null)
    {
      CheckCategoryProductUrl(categoryId, productId);
    }
  }

Note that if any of the IDs are not present, the function exits without doing any further verification. This behavior is important because we may not want to enforce "standard" URLs for all sections of the Web site.

UrlTools.CheckCategoryProductUrl() reads the category and product names from our fictional database, which is simulated using the SeoData class. In a real-world scenario, we'd use a real database, but for the purposes of this exercise it was easier to simply hard-code a few products and categories in NameValueCollection objects. We also read the HTTP_X_REWRITE_URL server variable, which is used by both ISAPI_Rewrite and UrlRewriter.NET to store the URL that was requested by the visitor.

  public static void CheckCategoryProductUrl(string categoryId, string productId)
  {
    // the current HttpContext
    HttpContext context = HttpContext.Current;
    // the URL requested by the visitor
    string requestedUrl = context.Request.ServerVariables["HTTP_X_REWRITE_URL"];
    // retrieve product and category names from fictional database
    string categoryName = SeoData.Categories[categoryId];
    string productName = SeoData.Products[productId];

Then we do a simple check to ensure that categoryName and productName are not null. If any of these is null, we assume they couldn't be found in the database, so we can't create a page to show their details. In this case we call the Go404() method, which uses Server.Transfer() to load the 404 page we built earlier in this chapter:

    // if the category or the product doesn't exist in the database, return 404
    if (categoryName == null || productName == null)
    {
      Go404();
    }

If we do have a category name and a product name, we use the link factory built in Chapter 3, "Provocative SE-Friendly URLs," of the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO, to obtain the standard version of the URL, which contains the product and category names and IDs:

    // obtain the standard version of the URL
    string standardUrl = LinkFactory.MakeCategoryProductUrl(categoryName, int.Parse(categoryId), productName, int.Parse(productId));

The standardUrl variable will contain a string such as http://seoasp/Products/SEO-Toolbox-C6/Link-Juice-P31.html. We compare this value with the URL requested by the visitor. If they don't match, we 301 redirect the request to the URL contained by standardUrl:

    // 301 redirect to the proper URL if necessary
    if (siteDomain + requestedUrl != properUrl)
    {
      context.Response.Status = "301 Moved Permanently";
      context.Response.AddHeader("Location", standardUrl);
    }
  }

This article is excerpted from Chapter 4, "Content Relocation and HTTP Status Codes," of the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO (Wrox, 2007, ISBN: 978-0-470-13147-3) by Cristian Darie and Jaimie Sirovich. Cristian Darie is a software engineer with experience in a wide range of modern technologies, and the author of numerous books and tutorials on AJAX, ASP.NET, PHP, SQL, and related areas. Cristian currently lives in Bucharest, Romania, studying distributed application architectures for his PhD. He's getting involved with various commercial and research projects, and when not planning to buy Google, he enjoys his bit of social life. If you want to say "Hi," you can reach Cristian through his personal Web site at http://www.cristiandarie.ro. Jaimie Sirovich is a search engine marketing consultant. He works with his clients to build them powerful online presences. Officially Jaimie is a computer programmer, but he claims to enjoy marketing much more. He graduated from Stevens Institute of Technology with a BS in Computer Science. He worked under Barry Schwartz at RustyBrick, Inc., as lead programmer on all eCommerce projects until 2005. At present, Jaimie consults for several organizations and administrates the popular search engine marketing blog, SEOEgghead.com. Their other recent wrox.com article excerpt from this book is Provocative Search Engine Friendly URLs in ASP.NET.