Wrox Home  
Search
Flickr Mashups
by David A. Wilkinson
January 2007, Paperback


Excerpt from Flickr Mashups

Flickr Mashups: Visualizing the News

by David A. Wilkinson

Flickr contains tens of millions of photographs taken by members all over the world, which must by now cover practically every subject imaginable. So — in theory at least — for pretty much any subject you find discussed on the web, you ought to be able to find photos from Flickr that could be used to illustrate it. In this mashup, you will try to do exactly that — take news feeds from some of the major news sites, such as the BBC and CNN and then automatically illustrate the news stories contained in those feeds with photos taken from Flickr.

The RSS Format

Flickr offers RSS as one of the formats used for the feeds it offers, but there's much more to RSS than simply being a way to represent Flickr photostreams. RSS is an XML-based data feed format widely used for syndication of web sites. Originally devised by Netscape in 1999, RSS has a long and complex history and comes in many different versions. While each version has its own name, each one is confusingly abbreviated to "RSS" — you might hear RSS described as Really Simple Syndication, Rich Site Summary, or RDF Site Summary, all depending on which version is being referred to. Whichever version you use, RSS provides the same basic set of information — a feed consists of a channel, and a series of items. The channel provides basic information about the feed, while each item is an individual piece of content.

For the examples here, we will use the feeds provided by BBC News. Remember, these news feeds provide up-to-the-minute information on current news stories, so the contents will change frequently — the structure, however, will remain the same. The main BBC News RSS feed is found at

http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml

Let's take a look at how an organization such as the BBC uses RSS to represent news stories — here's a typical extract from a news feed:

<rss version="2.0">
  <channel>
    <title>BBC News | News Front Page | World Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/2/hi/default.stm</link>
    <description>Visit BBC News for up-to-the-minute news, breaking news, video, 
audio and feature stories. BBC News provides trusted World and UK news as well as 
local and regional perspectives. Also entertainment, business, science, technology 
and health news.</description>
    <language>en-gb</language>
    <lastBuildDate>Sat, 17 Jun 2006 14:24:42 GMT</lastBuildDate>
    <copyright>Copyright: (C) British Broadcasting Corporation, see 
http://news.bbc.co.uk/2/hi/help/rss/4498287.stm for terms and conditions of 
reuse</copyright>
    <docs>http://www.bbc.co.uk/syndication/</docs>
    <ttl>15</ttl>
    <image>
      <title>BBC News</title>
      <url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url>
      <link>http://news.bbc.co.uk/go/rss/-/2/hi/default.stm</link>
    </image>
    <item>
      <title>Gates to end daily Microsoft role</title>
      <description>Bill Gates says he will end his day-to-day role as head of 
software giant Microsoft by 2008, to focus on his charity.</description>
      <link>http://news.bbc.co.uk/go/rss/-/2/hi/business/5085444.stm</link>
      <guid 
isPermaLink="false">http://news.bbc.co.uk/2/hi/business/5085444.stm</guid>
      <pubDate>Fri, 16 Jun 2006 13:00:21 GMT</pubDate>
      <category>Business</category>
    </item>
  </channel>
</rss>

As you can see by the opening tag, this is RSS version 2.0. Other RSS versions have different formats, but you will still find the same key pieces of information within them. The <channel> tag contains elements that define the title of the feed, its description, and a URL to the feed's web site. The BBC offers a number of different feeds, such as world news, entertainment news and technology news — each one of these feeds is essentially a different RSS channel. Within a channel, there are a number of items — each item has its own title, description, and link

Magpie

Magpie is an open-source project that greatly simplifies the process of extracting data from an RSS feed. It is very simple to use — all you have to do is pass it the URL of the feed and it will convert the feed data into a convenient PHP structure for you to use. It seamlessly handles all of the different versions of RSS for you, so you don't need to worry about the actual format of the feed itself. You can download Magpie from http://magpierss.sourceforge.net/.

Installing Magpie

Once you have downloaded it, installing Magpie is very straightforward. Create a new directory in the lib directory in your mashups site on your web server called magpie. Into this lib/magpie directory, copy all of the .inc files that were included in the Magpie distribution — rss_cache.inc, rss_fetch.inc, rss_parse.inc, and rss_utils.inc. There is also an extlib directory, which you should copy into the lib/magpie directory — this sub-directory contains just one file, Snoopy.class.inc.

Try It Out Retrieving a Feed with Magpie

In this exercise, you will create a script to retrieve a feed from BBC News using Magpie.

1. Create a new directory in the top level of your mashups site for called visualizing-news. In this directory, create a new PHP file called example-1.php:
<?php
require_once(dirname(__FILE__) . '/../lib/magpie/rss_fetch.inc');
$feed = 'http://newsrss.bbc.co.uk/rss/newsonline_world_edition/technology/rss.xml';
$rss = fetch_rss($feed);
$items = $rss->items;
?>
<h1><?php echo $rss->channel['title'] ?></h1>
<p><?php echo $rss->channel['description'] ?></p>
<ul>
<?php
foreach ($items as $item)
{
?>
  <li><strong><?php echo $item['title'] ?></strong><br/>
  <?php echo $item['description'] ?>
  </li>
<?php 
}
?>
</ul>

2. Load this page into your browser, you will see the news feed, formatted with the news items displayed as a simple list.

How It Works

The first few lines of the script simply read in the Magpie libraries. The line that actually does all of the real work is:

$rss = fetch_rss($feed);

This calls the fetch_rss function in the Magpie library, passing in the URL of the RSS feed as a parameter. Magpie then returns a magpierss object, containing all the information from the feed. The two attributes that are of the most interest here are the channel and items arrays. channel is an associative array containing general information about the feed — title, description, link. items is an array of the actual news items themselves.

Finding Images on Flickr

Having now got your hands on the PHP structures containing the news items from the RSS feed, you can see that the title and description fields contain the text that describe the news story. If you were to choose some of the words found in those fields, it should be possible to pass them on to Flickr and find any photos that use that set of words as tags. But which words should you use?

In the title and description of the Microsoft news story, you can see the following text

Gates to end daily Microsoft role
Bill Gates says he will end his day-to-day role as head of 
software giant Microsoft by 2008, to focus on his charity.

A very simplistic approach to choosing keywords would be to try to pick out all the words that are names of people, places or companies. To do that, just select all the proper nouns — in other words, anything that begins with a capital letter. If you do that, leaving the words in the order in which they appear, you have

Gates, Microsoft, Bill

A tag search on Flickr, looking for photos that have all those words as tags, should retrieve images that have a reasonable chance of actually being pictures of Bill Gates. Having only three keywords makes this a nice easy example, but what if the news story was instead about Wall Street's reaction to Bill Gates' announcement? By picking out capitalized words as keywords, you would then likely have a list of keywords that looked something like

Gates, Microsoft, Bill, Wall, Street, New, York, NASDAQ

The probability of there being a photo on Flickr that just happens to contain all those tags is relatively low. Clearly the search should look for photos that contain some, but not necessarily all, of those tags — in fact the more the better. You can do this by using the flickr.photos.search API method. If you set tag_mode to any and set sort to relevance, you end up with exactly what you want — the search will find photos that have one or more of the keywords as tags, but sort by relevance means that photos that match more of the tags will appear higher up the search results list.

phpFlickr

To allow you to easily send and receive requests to and from Flickr, you will use the phpFlickr library created by Dan Coulter. phpFlickr is distributed under the GNU General Public License (GPL) and can be downloaded from SourceForge at http://sourceforge.net/projects/phpflickr, whilst the main phpFlickr web site is at http://www.phpflickr.com/. Before continuing with this example, you should download and unpack phpFlickr. Place the phpFlickr files in a lib/phpFlickr directory on your web server.

Try It Out Retrieving Relevant Images

In this exercise, you'll create a script to retrieve images from Flickr based on keywords, sorted by relevance.

1. Create a new PHP file in the visualizing-news directory, example-2.php:

<?php
require_once(dirname(__FILE__) . '/../lib/phpFlickr/phpFlickr.php');
$flickr = new phpFlickr('YOUR-API-KEY', NULL);
$keywords = array('bill', 'gates', 'microsoft');
$args = array(
  'tags' => join($keywords, ','),
  'tag_mode' => 'any',
  'sort' => 'relevance', 
  'page' => 1, 
  'per_page' => 6,
);
$photos = $flickr->photos_search($args);
$photoList = $photos['photo'];
?>
<ul>
<?php
foreach ($photoList as $photo)
{
  $squarePhoto = 'http://static.flickr.com/' . $photo['server'] 
    . '/' . $photo['id'] . '_' .$photo['secret'] . '_s.jpg';
?>
  <li><img src="<?php echo $squarePhoto ?>"></li>
<?php 
}
?>
</ul>

Make sure you put your Flickr API key where it says YOUR-API-KEY.

2. If you look at that page in your browser, you should see a list of six images — hopefully, some of them will be of Bill Gates.

How It Works

This PHP script calls the flickr.photos.search method to retrieve a list of six photos that use any of the keywords as tags. The search results are sorted so that the most relevant matches are returned first.

The list of tags is built simply by converting the supplied list of keywords into a comma separated string. tag_mode is set to any to ensure that any photos with one or more of the requested tags are returned. Finally, setting sort to relevance ensures that Flickr will sort the returned results with the best matches first.