Wrox Home  
Search
Amazon.com Mashups
by Francis Shanahan
January 2007, Paperback


Excerpt from Amazon.Com Mashups

A Generic Storage Solution Using Amazon S3

by Francis Shanahan

Amazon's Simple Storage Service (S3) is a storage service that allows anyone to purchase industrial-quality storage space on an as-needed basis. You pay only for what you use. This has major repercussions in terms of the business plans that it enables. A student working out of a dorm room can build a professional-quality photo storage site without requiring millions of dollars from venture capitalists up front. Budding media entrepreneurs can share content without a highly available redundant disaster-recovery site. The possibilities are endless.

This article uses the Amazon S3 to store files on the Internet. You don't really know where or how these files are stored, only that they are stored securely and without fear of being lost.

Understanding the Architecture

The sample application uses SOAP over HTTP to communicate with the S3 service. The architecture of the sample application is depicted in Figure 1.

Amazon.com - Mashups
Figure 1: The S3 application architecture

How It Works

  1. The default.aspx page is presented to the user. It contains a form with a number of pieces of functionality.
  2. The user creates a bucket using the first set of controls on the sample page. This bucket can be used to store objects also known as files.
  3. The user's request is forwarded onto the Amazon S3 server as a SOAP request. A bucket is created on Amazon.
  4. The user locates a file on a local computer and clicks Submit. The file is converted into a stream and uploaded to the sample application's Web server.
  5. The Web server then takes this stream and passes it on to the Amazon S3 server using SOAP over HTTP. The Amazon S3 server stores this stream as a file in its secure storage. The file is now associated with the bucket created in Step 1.

With this sample application the user can also delete a file or the bucket, list available buckets, or list the contents of a particular bucket.

Figure 2 displays the completed application. As you can see, this is a generic utility-type application. The code is written in such a way as to clearly articulate the steps involved. Feel free to take this code and build your own applications on top of it.

Amazon.com - Mashups
Figure 2: The finished application

The next section describes S3 in terms of registration requirements and key concepts.

Registering for S3 Access

Because S3 is a pay-per-use service, you will need to sign up for explicit access to the service to use it. If you are already an Amazon customer, you can use your billing and credit card information that Amazon already has. This makes signup literally a one-minute procedure.

You will only be billed based on the storage space you use and the duration for which you use it. As someone who does this for a living, I can tell you that Amazon's rates are highly competitive, and will beat any dedicated hardware provider's estimates by a number of orders of magnitude.

Key Concepts

Amazon S3 uses a number of key concepts to organize data storage on its servers.

Buckets

Every object stored on S3 is placed in a bucket. A bucket is simply a way to group objects together and aggregate them for the purposes of usage tracking.

Bucket names have global scope. So, if a user creates a bucket named "mybucket", no one else can create a bucket of the same name.

Objects

Objects represent the files that actually get stored on the platform. These can be files of any type, and are typically associated with a set of metadata. The object consists of the metadata and the file itself.

Objects can be created or deleted, and associated with a defined set of permissions. Only users or groups with the appropriate level of permissions can access a given object.

Every object in S3 is assigned a key, which is analogous to a filename, and is what uniquely identifies the object within your bucket.

Try It Out Setting Up the Project

To build the generic S3 application, follow these steps:

  1. Create a new project and add a class to the project named S3Helper.cs. This class should reside in the App_Code directory.
  2. Add a function to this class named GetTimeStamp. This function will be implemented in the next section.
  3. Add a Web reference to the S3 SOAP service. The WSDL endpoint for the service is located here:
    http://s3.amazonaws.com/doc/2006-03-01/AmazonS3.wsdl
  4. Create a page named default.aspx and design it to look like Figure 3. The remainder of this article walks you through the specifics of some of the page elements.
Amazon.com - Mashups
Figure 3: The client application at design time

Required Parameters

To make any call to the S3, you must supply a number of standard items:

  1. The first is your AWSAccessKeyId. This is assigned when you register as a developer with the Amazon AWS service.
  2. The second item is the current timestamp in Greenwich Mean Time (GMT). If the time you specify varies from the time on the Amazon servers by more than 15 minutes, the operation will be declined. This is for security reasons to ensure no one intercepts your request and attempts to replay it later.
  3. The third item you need to specify is probably the most interesting. This is a string that acts as a signature. The signature is the cryptographic hash of a piece of data comprising the operation you are calling, a special secret key, and the timestamp sent as item 2.

These are standard items that are included in every call. The secret key is similar to the AWSAccessKeyId and is provided when you explicitly sign up for S3 access.

You must never disclose the secret key to anyone. Amazon will never ask for it, so if you are asked to disclose it, you know the requester is a fraud.

Authenticating with S3

The folks at Amazon have really done their homework when it comes to authentication. The Amazon S3 authenticates each and every method call. There is no notion of a session when working with S3. As a result, every call requires a set of parameters to be included, regardless of any other required information for the API.

The TimeStamp

There are two main instances when a timestamp is required:

  • The first is for inclusion in a typical method call, in which case, the timestamp is included as a DateTime object.
  • The second is in calculating the signature digest.

To create a valid TimeStamp value as a DateTime object, you must convert the current time to Universal Coordinated Time (UTC), also known as Greenwich Mean Time. The next section shows you how to do this.

Calculating the TimeStamp

To calculate a timestamp for use in an S3 call, the first step is to obtain the current time. This is then used in subsequent calls. It's important to store this value because time marches on, and you will not be able to recalculate a timestamp if you throw away the one you are working with.

I have provided a static helper class named S3Helper in the sample code that returns the correctly formatted timestamp. The code looks like this:

    /// <summary>
    /// Returns a new DateTime object set to the provided time 
    /// but with precision limited to milliseconds. 
    /// </summary>
    /// <param name="myTime"></param>
    /// <returns></returns>
    public static DateTime GetTimeStamp(DateTime myTime)
    {
         DateTime myUniversalTime = myTime.ToUniversalTime();
         DateTime myNewTime = new DateTime(myUniversalTime.Year,
                myUniversalTime.Month, myUniversalTime.Day,
                myUniversalTime.Hour, myUniversalTime.Minute,
                myUniversalTime.Second, myUniversalTime.Millisecond);
        return myNewTime;            
    }

This code accepts a DateTime as a parameter. This is the timestamp. That timestamp value is then converted into UTC, which is effectively mapped against GMT.

The precision of the DateTime is limited to milliseconds to conform to the expected format. Too much precision will break the Amazon API.

The second time helper function formats the same TimeStamp value as a string. This is used later in computing the message signature. The code looks like this:

    /// <summary>
    /// Formats the provided time as a string limited to millisecond precision
    /// </summary>
    /// <param name="myTime"></param>
    /// <returns></returns>
    public static string FormatTimeStamp(DateTime myTime)
    {
        DateTime myUniversalTime = myTime.ToUniversalTime();
        return myUniversalTime.ToString("yyyy-MM-dd\\THH:mm:ss.fff\\Z", 
		 System.Globalization.CultureInfo.InvariantCulture);        
    }

You should add both of these functions to the S3Helper class.

Calculating the Signature

The signature is the third component in the Amazon authentication scheme. The signature proves knowledge of the user's secret key (not the AWSAccessKeyId). This is a special key that is known only to the registered developer and should never be disclosed to anyone. By creating a digest with this key as input, Amazon can verify that the operation request came from a valid source, and has not been tampered with.

What Is a Digest?

A digest (or message digest) is a string value produced by applying a mathematical algorithm to a piece of content. The algorithm is such that if even a single character in the content is modified or changed, the resulting digest value will be affected, yielding a brand new digest value. A message digest is sometimes known as a hash value.

There are many algorithms available to calculate message digests. Amazon uses the HMACSHA1 algorithm, which is implemented in .NET using the System.Security.Cryptography namespace.

Add a using statement to your code in S3Helper.cs as follows:

using System.Security.Cryptography; 

Amazon expects the digest to be created based on a concatenated string of the following data elements:

  • The string "AmazonS3"
  • The operation you are invoking (for example, "PutObjectInline")
  • The string representation of the TimeStamp included in the call
  • HMACSHA1 constructs a hash-based message authentication code (HMAC) using a piece of data and a key input. The algorithm mixes the key with the data. It then applies a hash function to obtain a digest. The result is mixed with the key input again, and the result is hashed one more time. The result is a secure hash code that can be used to determine whether the data has been tampered with during transport.

    Add the following code, which creates an HMACSHA1 hash or digest of the given data elements:

    public static string 
     GetSignature(string mySecretAccessKeyId, string strOperation, DateTime myTime)
        {
            Encoding myEncoding = new UTF8Encoding();
            // Create the source string which is used to create the digest
            string mySource = "AmazonS3" + strOperation + FormatTimeStamp(myTime);
            // Create a new Cryptography class using the 
            // Secret Access Key as the key
            HMACSHA1 myCrypto = new HMACSHA1(myEncoding.GetBytes(mySecretAccessKeyId));
            // Convert the source string to an array of bytes
            char[] mySourceArray = mySource.ToCharArray();
            
            // Convert the source to a UTF8 encoded array of bytes
            byte[] myUTF8Bytes = myEncoding.GetBytes(mySourceArray);
            
            // Calculate the digest 
            byte[] strDigest = myCrypto.ComputeHash(myUTF8Bytes);
                                                          
            return Convert.ToBase64String(strDigest);
        }

    Now, you have generic Signature and TimeStamp functions that can be used with all the subsequent S3 operations.

    Excerpted from Chapter 15, " A Generic Storage Solution Using Amazon S3," of Amazon.com Mashups (Wrox, 2007, ISBN: 978-0-470-09777-9), by Francis Shanahan. Francis Shanahan is a senior software architect with more than ten years of industry experience. He specializes in new and emerging technologies in the areas of Web services, the user interface, and digital identity. Most recent examples are the Windows Communication Foundation (WCF), Windows Presentation Foundation (WPF), and CardSpace. He has led several multimillion dollar engagements for Fortune 50 companies through the full project life cycle. He has published numerous articles in both paper and online media, including Microsoft's MSDN Web site. When he's not building prototypes and messing around aimlessly on the computer, he enjoys cutting dove tails, the pentatonic scale, crystal oscillators, breaking 100 with his Callaway, and spending time with his family. You can contact Francis or read his blog at http://www.FrancisShanahan.com.