Wrox Home  
Search
Beginning RSS and Atom Programming
by Danny Ayers, Andrew Watt
May 2005, Paperback


Excerpt from Beginning RSS and Atom Programming

RSS 2.0: Really Simple Syndication

RSS 2.0, Really Simple Syndication, was announced in August 2000. It follows a line of inheritance from RSS 0.9x. RSS 2.0, unlike RSS 1.0, makes no use of the Resource Description Framework (RDF). However, RSS 2.0 does move one step closer to the RSS 1.0 approach, in that XML namespaces are adopted.

Any RSS 0.91 document is supposed to be a legal RSS 2.0 document. At least if the value of the version attribute were changed to a value of 2.0 it might be legal. In practice, many aggregators will process the information feed document without taking account of the value of the version attribute of the rss element. The variations in the versions of RSS have taught many authors of aggregator tools to be generous in what markup is accepted for processing. Therefore, any discrepancy in the value of the version attribute is unlikely to cause a practical problem in the majority (or perhaps all) of aggregators.

What Is RSS 2.0?

RSS 2.0 is, at the time of writing, the latest version of the development tree of RSS. It passes through the UserLand version of RSS 0.91 and RSS 0.92. In common with RSS 0.9x, RSS 1.0 avoids the use of RDF. It seems reasonable to conclude that the avoidance of RDF is partly a result of software pioneer Dave Winer's preference for simplicity and his assumption that RSS feeds transmit information of largely transitory interest. Those who see information feeds as conduits for disposable information don't see the value of using RDF in a feed, because metadata is unimportant for those developers who see information feeds as containing disposable, transitory information.

The changes in RSS 2.0 from RSS 0.92 are, with one exception, fairly minor. For example, there are a few new elements and a few changes regarding what particular elements should contain. The one substantive change in RSS 2.0 is the use of XML namespaces. Using XML namespaces opens up possibilities for extending RSS 2.0 using modules.

The RSS 2.0 specification is located at http://blogs.law.harvard.edu/tech/rss.

XML Namespaces in RSS 2.0

RSS 2.0 uses the XML namespaces technique specified in the Namespaces in XML recommendation, located at www.w3.org/TR/REC-xml-names/.

Because versions 0.91 and 0.92 of RSS don't use XML namespaces, all the elements associated with those specifications are, inevitably, not in any XML namespace. Therefore, unlike the situation with RSS 1.0 where a namespace URI is defined, in RSS 2.0 all the RSS elements are in no namespace.

The availability of XML namespaces allows RSS 2.0 documents to use elements from other namespaces, provided that an appropriate namespace declaration has been made. For example, you can use the Dublin Core module.

New Elements in RSS 2.0

There are several new elements in RSS 2.0. Each is briefly described in the following list. The use of these new elements is described in more detail in the discussion on RSS 2.0 document structure.

  • author: An optional child element of the item element
  • comments: An optional child element of the item element
  • generator: An optional child element of the channel element
  • guid: An optional child element of the item element
  • pubDate: An optional child element of the item element
  • ttl: An optional child element of the channel element

The RSS 2.0 Document Structure

The RSS 2.0 document structure has many similarities to the structure of RSS 0.91 and RSS 0.92 documents. For convenience, if you choose to implement RSS 2.0 only, the document structure is described without requiring you to cross-reference the chapter on RSS 0.91 and 0.92.

The rss Element

The rss element is the document element of an RSS 2.0 document. It has a required version attribute with a value of 2.0. Supposedly, RSS 0.91 and 0.92 documents are legal RSS 2.0 documents but they have a different value for the version attribute. In practice this works, despite the different and theoretically illegal values in the version attribute in RSS 0.91 and 0.92 documents, because many aggregators ignore the value in the version attribute.

If you are writing an RSS 2.0 document, the start tag of the rss element should be written as follows:

<rss version="2.0">

You can also write it using single quote marks:

<rss version='2.0'>

The rss element has a single child element, the channel element. All content of the information feed document is contained in the channel element.

Notice that there is no namespace declaration on the preceding rss start tag. For consistency with RSS 0.91 and 0.92, the elements of RSS 2.0 are in no namespace. This has the advantage of backwards compatibility but does mean there is a risk of naming collisions. In practice, the risk of naming collisions is slight because most non-RSS 2.0 elements likely to be found in an RSS 2.0 document are in a namespace, which allows the aggregator or other user agent to distinguish those elements from RSS 2.0 elements.

The channel Element

The channel element is the only permitted child element of the rss element. The channel element has no attributes. The remainder of an RSS 2.0 document consists of child elements or descendant elements of the channel element.

The following child elements of the channel element are required in all RSS 2.0 documents.

  • title: Contains the name that refers to the information feed. If the information feed refers back to a Web site or blog, the value of the title element is typically the name of that site or blog.
  • link: Contains a URL that allows linking to the Web site or blog that's associated with the information feed.
  • description: Contains a brief description of the information feed.

A minimalist RSS 2.0 document would therefore look like the following document:

<rss version="2.0">
 <channel>
  <title>Reflecting on Microsoft</title>
  <link>http://www.tfosorcim.org/blog/</link>
  <description>The Reflecting on Microsoft blog discusses issues 
  relating to specific Microsoft products as well as the 
  much larger issue of the competition between the proprietary 
  and open-source approaches to software
 development.</description>
 </channel>
</rss>

This document is of little value in an aggregator because it contains no item elements. Surprisingly, the item element is optional in RSS 2.0 although, in practice, a typical RSS 2.0 document will have several.

The following elements are optional child elements of the channel element. Some elements, which have their own child elements, are discussed further following the list.

  • category: This element contains information about the categories of information contained in the information feed. There can be several category elements as child elements of a channel element.
  • cloud: This element has several attributes that contain information specifying how a connection can be made to a cloud, allowing subscription to an information feed to be always up to date.
  • copyright: This contains copyright information relating to the feed.
  • docs: This contains a URL pointing to the RSS 2.0 specification.
  • generator: This contains information about the software that was used to produce the information feed.
  • image: This contains information so an aggregator can locate an image (in GIF, JPEG, or PNG format) to display in connection with the information feed.
  • language: This contains a two-letter language code, with optional extensions. Example values are en and en-us.
  • managingEditor: This contains the e-mail address of the contact for queries about editorial content.
  • pubDate: This contains the publication date for the feed.
  • rating: The PICS (Platform for Internet Content Selection) rating for the channel.
  • skipDays: This contains information indicating to an aggregator the days of the week when a feed is not expected to be updated.
  • skipHours: This contains information indicating to an aggregator the hours when a feed is not expected to be updated.
  • textinput: This displays a text box to allow the users to input information for processing on a server, typically (if the textinput element is present) on the server from which the feed originates.
  • ttl: This contains information about the period of time before the aggregator should check for new content.
  • webMaster: This contains the e-mail address of the contact for queries about technical issues relating to the information feed.

The image Element

The image element specifies an image that can be displayed along with the channel in an aggregator or other user agent. The image element has the following required child elements:

  • link: The value of this element is a URL representing the feed or Web site.
  • title: This describes the image. If the feed is being rendered as HTML, the content of the title element may be used as the value of the alt attribute of the img element in HTML/XHTML.
  • url: The content of this element is a URL that specifies the location from which the image can be retrieved.
The link element, a child of the image element, is required, although it seems simply to duplicate the content of the link element child of the channel element. The RSS 2.0 specification is not clear about the consequences should these two link elements contain different URLs.

There are three optional child elements of the link element:

  • description: This contains a short description of the image. The specification suggests that it be used in the title attribute of the link in the corresponding HTML.
  • height: This contains the height of the image in pixels.
  • width: This contains the width of the image in pixels.

The cloud Element

The cloud element is a child element of the channel element. The attributes of the cloud element are used to specify a Web service that implements the rssCloud interface. A useful way to look on a cloud is as a Web application. A cloud acts as a central coordinator for subscriptions to an information feed. Instead of an aggregator polling a server at specified intervals (often hourly) the cloud (the coordinator) informs subscribed users when a change has taken place.

A cloud element would appear similar to the following markup:

<cloud domain="rpc.sys.com" port="80" path="/RPC2"
 registerProcedure="myCloud.rssPleaseNotify" protocol="xml-rpc" />

The textinput Element

The textinput element allows a user to enter text to be sent to a server-side process, such as a CGI script. Some people question the appropriateness of the textinput element, seeing that such functionality belongs more appropriately inside an individual Web page.

  • description: This contains a short description of the text input area.
  • link: This contains a URL which specifies a server-side process, for example a CGI script, to which the text entered by the user is sent.
  • name: This contains a name for the text in the text input area.
  • title: This contains the label for the submit button associated with the text input functionality.

The item Element

The item element may occur any number of times in an RSS 2.0 information feed document. Its child elements are described in the following list. The specification is unclear about whether or not these child elements are required. In practice, you can use which child elements you want and omit those you don't. There are theoretically some situations in which you could be in conflict with the wording of the RSS 2.0 specification but this won't arise with real-world items with a title and at least some content.

  • author: This contains an e-mail address for a person with responsibility for authoring the content of the item.
  • category: An item element can have multiple category element children. The content of the category element is information about a category into which the content of the item may be assigned. Each category element has an optional domain attribute, the value of which may specify a taxonomy to which the content of the item belongs. For example, in an item about XML the domain might be "markup languages."
  • comments: This contains a URL of a Web page where a user can enter comments about the item.
  • description: This contains a summary of the item or, in the case of items with a relatively small amount of text, might contain the full text of the item.
  • enclosure: This contains information specifying a media object associated with the item. This is an empty element with three attributes. The url attribute contains a URL from which the media object can be retrieved. The length attribute specifies the size of the serialized object in bytes. The type attribute specifies the media type of the object.
  • guid: This contains a value that uniquely identifies the item. The RSS 2.0 specification does not specify rules intended to achieve uniqueness. One typical approach is to use a URL from which the item can be retrieved. In that situation the guid element is likely to have an isPermaLink attribute with a value of true.
  • link: This contains a URL that can be used to retrieve the full text of the item. When the item contains its full text in the description element, the link element is optional; otherwise, it is required.
  • pubDate: This contains information about when the item was published. It includes both date and time components.
  • source: This contains information about the channel (perhaps on another site) that the item originally came from. It has a url attribute that contains the URL for the source information feed. The content of the source element is, typically, the title of the feed.
  • title: This contains a title for the item.

An example item element is shown in the following example RSS 2.0 document.

An Example RSS 2.0 Document

Having looked at the individual parts of the document structure of an RSS 2.0 document, you can now take a look at a sample RSS 2.0 document that happens to contain my first author blog post on Wrox.com.

<?xml version="1.0" ?> 
<rss version="2.0">
<channel>
<title>Wrox P2P Blogs - Andrew Watt</title> 
<ttl>60</ttl> 
<description>Wrox.com P2P Community Blogs</description> 
<link>http://p2p.wrox.com/blogs_author.asp?AUTHOR_ID=22322</link> 
<copyright>Copyright © 2000-2004 by John Wiley & Sons, Inc. or 
related companies.
 All rights reserved.</copyright> 
<language>en</language> 
<image>
 <url>http://p2p.wrox.com/images/p2p/wrox_rss_logo.gif</url> 
 <title>Wrox P2P Blogs - Andrew Watt</title> 
 <link>http://p2p.wrox.com/blogs_author.asp?AUTHOR_ID=22322</link> 
 <width>36</width> 
 <height>31</height> 
</image>
<item>
<title>Firefox 1.0 is available</title> 
<description>Firefox 1.0 is available now for download from <a 
href="http://www.mozilla.org" 
target="_blank"><a href="http://www.mozilla.org" 
target="_blank">http://www.mozilla.org</a></a>.
<br /><br />It downloaded quickly 
for me, although that could change as the servers get busier, and it installed 
smoothly. <br /><br />If you haven't already spotted 
the new functionality to add a live RSS or Atom feed to your 
Firefox bookmarks using the button at the extreme 
bottom right of the Firefox window give it a go....</description> 
<pubDate>Tue, 9 Nov 2004 12:01:11 GMT</pubDate> 
<link>http://p2p.wrox.com/blog.asp?BLOG_ID=37</link> 
<comments>http://p2p.wrox.com/blogs_comments.asp?BLOG_ID=37</comments> 
</item>
</channel>
</rss>

The example document contains only one item and it does not use all of the many optional elements that the RSS 2.0 specification allows. Hopefully, it will give you an impression of what a simple RSS 2.0 document is like.