Excerpt from Professional SharePoint 2007 Development
Creating Content Type Metadata for SharePoint 2007 Document Management Solutions
By John Holliday
The term "Document Management" has become a catch-all phrase for anything having to do with documents in an enterprise setting. It is an overly broad term that covers many different aspects of managing documents; from access control to version control to auditing, review and approval of content. To understand what document management means in the SharePoint environment, it helps to consider the evolution of document management systems over the last decade or so. It also helps to appreciate the value that SharePoint provides as a development platform for document management solutions.
Early document management systems were focused primarily on keeping track of revisions to documents that involved multiple authors, and operated in a manner similar to source code control systems. Individual authors checked out documents, thereby locking them so that other authors could not overwrite their changes. System administrators could specify who had permission to view or edit documents, and could generate reports of document activity. Other functions included the ability to automatically number each major or minor revision and revert at any time to a specific version of the document, generating the final content from information stored within the database.
The notion of metadata became a key characteristic of legacy document management systems. Metadata is information about a document, as opposed to the document content itself. For example, the current version number is an example of metadata, since it is information about the document. Other examples are the title, subject, comments and keywords associated with the document.
Most document management systems store document metadata in a central database. In fact, many of the early document management systems were written as database applications. This worked well at a time when the only business process being modeled was the generic document revision cycle. It starts to break down, however, when you want to model other business processes.
This is where SharePoint emerges as a superior platform for developing document management solutions. SharePoint refines the notion of document metadata to distinguish between system, class and instance metadata. System-level metadata is maintained internally by SharePoint for all documents. Class-level metadata is stored within the SharePoint database for a given document library or content type and can be customized easily to include domain-specific information. Instance-level metadata is stored within each document instance as a set of document properties, and moves along with the physical document. This is especially important for managing documents in disconnected environments.
Defining Metadata Using Content Types
Metadata is the fuel that drives document management in SharePoint 2007, and the best way to work with document metadata is to define a content type. There are many benefits to using content types; the main one being that content types allow us to specify the custom fields needed to manage a document as it moves through the different stages of its lifecycle.
Solution developers are used to working with classes and objects, properties and methods; where each class defines the properties and methods for instances of that class. They then create objects to represent instances of each class and invoke methods on those objects to apply business rules that retrieve or modify the state of the properties associated with each instance. Building document management solutions will be much easier if you can map the core elements (document, metadata, repository, etc.) onto familiar abstractions like class and object that you are used to working with.
SharePoint 2007 content types provide just such an abstraction. The content type acts as a sort of document class, defining the columns and event receivers that comprise each instance. The columns are like properties and the event receivers are like methods. Take it one step further and say that the
ItemAdding event receiver acts as a
constructor and the
ItemDeleting event receiver acts as a destructor for each document instance.
The first step in defining a new content type is to determine from which of the built-in content types to derive the new content type. In object-oriented terms, you are choosing the base class for the new content type. SharePoint includes a number of default content types; all derived from the System content type, which serves as the root of the content type hierarchy.
Figure 1 shows some of the default content types and their identifiers:
SharePoint employs a special numbering scheme for identifying each content type, which it uses as a shortcut for creating new content type instances. Without such a numbering scheme, it might have been prohibitive to enable content type inheritance, since SharePoint would have needed to search through the database trying to resolve content type dependencies. This way, it only needs to examine the identifier, which it reads from right to left. For example, the Picture content type identifier is 0x010102, which SharePoint reads as id 02 (Picture) derived from Document (0x0101) derived from Item (0x01) derived from System (0x).
For custom content types that you define yourself, the identifier includes a suffix, which is the GUID associated with our type, separated by 00 as a delimiter. For example, look at an example content type id 0x0101004A257CD7888D4E8BAEA35AFCDFDEA58C. Again, reading from right to left, you have 4A257CD7888D4E8BAEA35AFCDFDEA58C derived from Document (0x0101) derived from Item (0x01) derived from System (0x). The 00 serves as a delimiter between the GUID and the rest of the identifier, as shown in figure 2.
Each content type references a set of columns (also called fields), which comprise the metadata associated with the type. It is important to note that content types do not declare columns directly. Instead, each content type includes column references that specify the identifiers of columns declared elsewhere within the SharePoint site. Column references are declared in XML using
Our project proposal content type is based on the built-in Document content type, which provides the following metadata fields:
- Name - The name of the file that contains the document content
- Title - The title of the document (inherited from the Item content type)
Next, you select from the built-in SharePoint fields to capture the common elements of a project proposal.
- Author (Text) - The author of the proposal
- Start Date (DateTime) - The date on which the project will start
- End Date (DateTime) - The date on which the project will end
- Status (Choice) - The current document status
- Comments (Note) - Additional comments
- Keywords (Text) - Keywords
In addition to the built-in columns, you need a few additional columns to complete the type definition.
- ProposalType (Choice) - The kind of proposal
- EstimatedCost (Currency) - The total cost of the proposed work
- BidAmount (Currency) - The proposed amount of the bid
- EstimatedHours (Number) - The total number of hours
- HourlyRate (Currency) - The proposed hourly rate
SharePoint provides two methods for declaring content types; using XML or using the Windows SharePoint Services object model. In actual practice, a hybrid approach is often useful. This is because while XML makes it easier to declare fields and other elements at a high level it also makes it harder to work with the content type from elsewhere in our solution. Once the essential elements have been identified, the object model provides more control over how those elements are used and how they interact with one another. What you need is an easy way to declare the type while preserving our ability to add enhanced functionality through code.
In Chapter 11, "Building Document Management Solutions," of Professional SharePoint 2007 Development (Wrox, 2007, ISBN: 978-0-470-11756-9), I present sections exploring both methods. This article only explores creating content type through code.