Back to description
XML (Extensible Markup Language) is a buzzword you will see everywhere on the Internet, but it’s also a rapidly maturing... more
XML (Extensible Markup Language) is a buzzword you will see everywhere on the Internet, but it’s also a rapidly maturing technology with powerful real-world applications, particularly for the management, display, and organization of data. Together with its many related technologies, which are covered in later chapters, XML is an essential technology for anyone working with data, whether publicly on the web or privately within your own organization. This chapter introduces you to some XML basics and begins to show you why learning about it is so important.
This chapter covers the following:
The two major categories of computer file typesbinary files and text filesand the advantages and disadvantages of each
The history behind XML, including other markup languages such as SGML and HTML
How XML documents are structured as hierarchies of information
A brief introduction to some of the other technologies surrounding XML, which you will work with throughout the book
A quick look at some areas where XML is useful
While there are some short examples of XML in this chapter, you aren’t expected to understand what’s going on just yet. The idea is simply to introduce the important concepts behind the language so that throughout the book you can see not only how to use XML, but also why it works the way it does.
... less
Chapter 1 discussed some of the reasons why XML makes sense for communicating data, so now it’s time to get your hands dirty... more
Chapter 1 discussed some of the reasons why XML makes sense for communicating data, so now it’s time to get your hands dirty and learn how to create your own XML documents. This chapter covers all you need to know to create well-formed XML. Well-formed XML is XML that meets certain syntactical rules outlined in the XML 1.0 recommendation.
This chapter includes the following:
How to create XML elements using start-tags and end-tags
How to further describe elements with attributes
How to declare your document as being XML
How to send instructions to applications that are processing the XML document
Which characters aren’t allowed in XMLand how to use them in your documents anyway!
Because the syntax rules for XML and HTML are so similar, and because you may already be familiar with HTML, we’ll be making comparisons between the two languages in this chapter. However, if you don’t have any knowledge of HTML, you shouldn’t find it hard to follow along.
If you have Microsoft Internet Explorer 5 or later, you may find it useful to save some of the examples in this chapter on your hard drive and view the results in the browser. If you don’t have IE5 or later, some of the examples include screenshots to show what the results look like. One nice advantage of doing this is that the browser will indicate whether you make a syntax mistake. I do this quite often, to ensure I haven’t mistyped anything.
The examples given in this chapter are also available for download from the Wrox website, at www.wrox.com; just find the entry for this title and click the Download Code link. If you wish to save yourself some typing, you can download the code from there, but typing these examples manuallyand occasionally making mistakes!will help you to learn and understand things better.
www.wrox.com
You have seen why XML provides some benefits over binary formats and can now create well-formed XML documents. At some point... more
You have seen why XML provides some benefits over binary formats and can now create well-formed XML documents. At some point, however, your applications will become more complex, and you will need to combine elements from various document types into one XML document.
Unfortunately, two document types often have elements with the same name, but with different meanings and semantics. This chapter introduces XML namespaces, the means by which you can differentiate elements and attributes of different XML document types from each other when combining them into other documents, or even when processing multiple documents simultaneously.
In this chapter, you will learn the following:
Why you need namespaces
What namespaces are, conceptually, and how they solve the problem of naming clashes
The syntax for using namespaces in XML documents
What is a URI, a URL, and a URN
As you’ve seen in the first few chapters, the rules for XML are straightforward. It doesn’t take much to create well-formed... more
As you’ve seen in the first few chapters, the rules for XML are straightforward. It doesn’t take much to create well-formed XML documents to describe any information that you want. When you create XML documents, you can categorize them into groups of similar document types based on the elements and attributes they contain. You learned that the elements and attributes that make up a document type are known as the document’s vocabulary. In Chapter 3, you learned how to use multiple vocabularies within a single document using namespaces. By this time, you may be wondering how to define your own types of documents and check whether certain documents follow the rules of your vocabulary.
Suppose you are developing an application that uses the <name> sample from Chapter 1. In the <name> sample, you created a simple XML document that allowed you to enter the first, middle, and last name of a person. In the sample, you used the name John Fitzgerald Johansen Doe. Now suppose that users of your application input information that does not match the vocabulary you developed. How could you verify that the content within the XML document is valid? You could write some code within your web application to check whether each of the elements is correct and in the correct order, but what if you want to modify the type of documents you can accept? You would have to update your application code, possibly in many places. This isn’t much of an improvement from the text documents discussed in Chapter 1.
<name>
The need to validate documents against a vocabulary is common in markup languages. In fact, it is so common that the creators of XML included a method for checking validity in the XML Recommendation. An XML document is valid if its content matches its definition of allowable elements, attributes, and other document pieces. By using special Document Type Definitions, or DTDs, you can check the content of a document type with special parsers. The XML Recommendation separates parsers into two categories: validating and nonvalidating. Validating parsers, according to the recommendation, must implement validity checking using DTDs. Using a validating parser, you can remove the content-checking code from the application and depend on the parser to verify the content of the XML document against the DTD.
Although you will learn everything you need to know about DTDs in this chapter, you might like to see the XML Recommendation and its discussion of DTDs for yourself. If so, you can look it up at http://www.w3.org/TR/REC-xml#dt-doctype.
http://www.w3.org/TR/REC-xml#dt-doctype
In this chapter, you will learn how to do the following:
Create DTDs
Validate an XML document against a DTD
Use DTDs to create XML documents from multiple files
In the last chapter, you learned that you can use Document Type Definitions (DTDs) to validate your XML documents. This avoids... more
In the last chapter, you learned that you can use Document Type Definitions (DTDs) to validate your XML documents. This avoids the need to write application-specific code to check whether your documents are valid. You also saw some of the limitations of DTDs. Since the inception of XML, several new formats have been developed that enable you to define the content of your vocabulary.
In 1999, the W3C began to develop XML Schemas in response to the growing need for a more advanced format for describing XML documents. Work had already begun previously on several efforts that were intended to better model the types of document being created by XML developers. The W3C’s effort took the best of these early technologies and added more features. During development, several members of the W3C designed simpler schema languages with fewer features outside of the W3C. Perhaps the most important effort is RELAX NG, covered in depth in Chapter 6.
Today, XML Schemas are a mature technology used in a variety of XML applications. Apart from their use in validation, XML Schemas are used in XQuery, covered in Chapter 9. XML Schemas can also be used in conjunction with web services and SOAP, as shown in Chapters 14 and 15, respectively.
A schema is any type of model document that defines the structure of something, such as database structures or documents. In this case, the something is an XML document. In fact, DTDs are a type of schema. Throughout this book, we have been using the term vocabulary where we could have used the word schema. So, what is an XML Schema? This is where it gets confusing. The term XML Schema is used to refer to the specific W3C XML Schema technology. W3C XML Schemas, much like DTDs, enable you to describe the structure of an XML document. When referring to W3C XML Schemas, the “S” in “Schema” should be capitalized. XML Schema definitions are also commonly referred to as XSDs.
The benefits of XML Schemas
How to create and use XML Schemas
How to document your XML Schemas
RELAX NG is a very powerful, yet easy to understand schema technology that can be used to validate XML instance documents... more
RELAX NG is a very powerful, yet easy to understand schema technology that can be used to validate XML instance documents. Like W3C XML Schemas, covered in the previous chapter, RELAX NG is grammar-based. It is possible for many XML instance documents to be valid according to a single RELAX NG schema document. Alternatively, it is possible for a single XML instance document to be valid with respect to multiple RELAX NG schema documents.
Here are some of the key features of RELAX NG:
It’s simple and easy to learn.
It uses pattern-based grammar with a strong mathematical foundation.
It has two different syntaxes: XML syntax and compact syntax.
It supports XML Schema datatypes.
It supports user-defined datatypes.
It supports XML namespaces.
It’s highly composable.
Elements and attributes are treated the same.
RELAX NG is a normalized grammar based on James Clark’s Tree Regular Expression for XML (TREX), and Makoto Murata’s Regular Language description for XML (RELAX). Because RELAX NG was created after DTDs and XML Schemas, the authors were able to address many of the problems in the earlier schema languages. They were able to remove the complexity associated with W3C XML Schemas while embracing some of its features. Additionally, the authors based RELAX NG on strong mathematical models. Having such models simplifies validator development and enables schema authors to make mathematical assertions about their schemas. XML Schema is the most widely supported validation technology, but RELAX NG is considered to be the simplest technology, and it is often favored when support is available. RELAX NG takes a different approach to validating XML documents, when compared to XML Schemas. RELAX NG schemas are based on patterns, whereas XML Schemas are based on types. In fact, the power of RELAX NG centers on its use of patterns. RELAX NG schemas can use pattern composition and named patterns to create reusable sections of schema documents.
Though RELAX NG does not have the type hierarchy of XML Schemas and does not support type inheritance, datatyping is supported. RELAX NG supports the datatypes provided by the W3C XML Schema Part II, Datatypes Recommendation. For example, RELAX NG schemas have full use of XML Schema datatypes, such as xs:int, xs:double, and xs:decimal, as well as the XML Schema facets previously discussed. In fact, RELAX NG was designed with pluggable datatypes in mind. That is, users can invent their own type system, and RELAX NG schemas can be built using user-defined types, instead of, or in addition to, using the XML Schema datatypes.
xs:int
xs:double
xs:decimal
RELAX NG syntaxes
RELAX NG patterns, which are the building blocks of RELAX NG schemas
Composing and combining patterns into higher-level components for reuse, as well as full schema grammars
The remaining features of RELAX NG, including namespaces, name-classes, datatyping, and common design patterns
When writing code to process XML, you often want to select specific parts of an XML document to process in a particular way... more
When writing code to process XML, you often want to select specific parts of an XML document to process in a particular way. For example, you might want to select some invoices that fit a date range of interest. Similarly, you may want to specifically exclude some part(s) of an XML document from processing. For example, if you make basic human resources data available on your corporate intranet, you probably want to be sure not to display confidential information such as salary for an employee. To achieve those basic needs, it is essential to have an understanding of a technology that allows you to select a part or parts of an XML document to process. The XML Path Language, XPath, is designed to allow the developer to select specific parts of an XML document.
The latest incarnation of XPath to be given candidate recommendation status by the W3C is version 2.0. The specification can be viewed at www.w3.org/TR/xpath20/. Because the version is still not a recommendation and only appeared in June 2006, and is vastly larger than version 1.0, there are still only a few processors supporting it. The current champion is Saxon, which provides a Java and a .NET version and is available in free or paid for versions, the latter implementing some of the more advanced, and optional, features. You can read how to install and configure Saxon in Chapter 8, which is devoted to XSLT. XPath was designed specifically for use with Extensible Stylesheet Language Transformations (XSLT), and with XML Pointer (XPointer), which is not discussed in detail in this book. More recently, XForms 1.0 makes use of XPath 1.0, too. The use of XForms, which includes XPath expressions that bind a form control to the instance data of an XForms document, is discussed in Chapter 20. XPath is also used in XQuery, covered in Chapter 9, and most XML DOM parsers support using it to locate nodes (for more on the XML DOM, see Chapter 11).
www.w3.org/TR/xpath20/
This chapter concentrates on version 1.0 features but also notes where things have changed. Later in the chapter we will look at some of the newer functions and syntax of version 2.0.
XPointer was intended for use with the XML Link Language, XLink. XLink, which became a W3C recommendation in 2001, has seen limited adoption to date. As a result, XPointer is currently also not widely used. Therefore, XPath in this chapter is described primarily in the context of how it is used with XSLT, and the code examples in the chapter use XSLT. To run XSLT code using the Saxon XSLT processor, see the information provided in Chapter 8.
Ways of looking at an XML document, including the XPath data model
How to visualize XPath and how the component parts of XPath syntax fit together to enable you to navigate around the XPath data model
The XPath axesthe “directions” that are available to navigate around the XPath data model
XPath 1.0 functions
XPath 2.0 new functions and features
To understand what XPath is and how it is used, we will first consider ways in which an XML document can be represented.
XSLT, Extensible Stylesheet Language Transformations, is a very important XML application in many XML workflows. In many... more
XSLT, Extensible Stylesheet Language Transformations, is a very important XML application in many XML workflows. In many business situations, data is either stored as XML or can be made available from a database as XML. XSLT is important because, typically, the way in which XML is stored needs to be changed before it is used. Wherever the data comes from, the XML might need to be presented to end-users or be shared with business partners in a format that is convenient for them. XSLT plays a key role in converting XML to its presentation formats and restructuring XML to fit the structures useful to business partners.
How XSLT can be used to convert XML for presentation or restructure XML for business-to-business data interchange
How XSLT differs from conventional procedural languages
An XSLT transformation is described in terms of a source document and a result document. However, under the hood, the transformation taking place is a source tree (which uses the XPath data model) to a result tree (which also uses the XPath data model).
How the elements that make up an XSLT stylesheet are used. For example, you look at how to use the xsl:value-of element to retrieve values from the source tree being transformed. In addition, you look at the xsl:copy and xsl:copy-of elements, which, respectively, shallow copy and deep copy nodes from the source tree.
xsl:value-of
xsl:copy
xsl:copy-of
How to use XSLT variables and parameters
The new features of XSLT 2.0 and how they make transformations easier
XSLT 2.0 reached W3C Recommendation status as of January 23, 2007.
Large amounts of information are now being stored as XML or can be made available as XML from relational and other databases... more
Large amounts of information are now being stored as XML or can be made available as XML from relational and other databases with XML functionality. As the volume of XML-based information increases, the need for a query language to efficiently query and make use of that XML data is obvious. At the time of writing, the W3C, the World Wide Web Consortium, is developing an XML query language called XQuery. This chapter introduces you to using XQuery and walks you through several working examples using XQuery’s features.
XQuery is likely to become as important in the XML world as SQL has become in the relational database world. In the near future, any self-respecting developer who uses XML will be expected to have at least a basic understanding of XQuery and the skill to use it to carry out frequently used queries. Those who work routinely with large volumes of XML data will be expected to have significant expertise in using XQuery as they create programmatic solutions to XML data-handling business issues.
In this chapter you will learn the following:
Why XQuery was created to complement languages such as SQL and XSLT
How to get started with XQuery using the XQuery tools that are already available
How to query an XML document using XQuery and how to create new elements in the result using element constructors
About the XQuery data model and how to use the different types of expression in XQuery, including the important FLWOR (for, let, where, order by, return) expressions
FLWOR
for
let
where
order by
return
How to use some XQuery functions
What further developments are likely in future versions of XQuery, including full-text searching and update functionality
At the time of writing, the specification of XQuery is not yet finalized at the W3C. However, much of the XQuery language is now stable and at the Proposed Recommendation stage. General XQuery information is located at http://www.w3.org/XML/Query, including links to each of the several XQuery specification documents.
http://www.w3.org/XML/Query
The volume of XML used by businesses is increasing as enterprises send increasing numbers of messages as XML. Many websites... more
The volume of XML used by businesses is increasing as enterprises send increasing numbers of messages as XML. Many websites use XML as a data store, which is transformed into HTML or XHTML for online display. The diversity of sources of XML data is increasing, too. For example, a new generation of forms products and technologies, such as Microsoft’s InfoPath and W3C XForms, is also beginning to supply XML data directly to data stores such as Microsoft Access or SQL Server from forms filled in by a variety of information workers.
To monitor business activity, you need to be able to store or exchange possibly huge amounts of data as XML and to recognize the benefits of XML’s flexibility to reflect the structure of business data and to process or interchange it further. In addition, XML is being used increasingly for business-critical data, some of which is particularly confidential and needs to be secured from unauthorized eyes. This raises many issues that need to be considered when storing XML in a production setting. It isn’t enough that data is available as XML; other issues such as security and scalability enter the picture, too.
In Chapter 9 you looked at XQuery, the XML query language under development at the W3C. This chapter covers broader issues that relate to the use of XML with databases. These issues are illustrated with examples that use XML with a native XML database and two different XML-enabled SQL databases.
Use cases for XML-enabled database systems
How to perform foundational tasks using eXist, an Open Source native XML database
How to use some of the XML functionality in Microsoft SQL Server and MySQL, two major relational databases with XML functionalities
This chapter explores the XML Document Object Model, often called the XML DOM or simply the DOM, and how it can be manipulated... more
This chapter explores the XML Document Object Model, often called the XML DOM or simply the DOM, and how it can be manipulated in various ways. The XML DOM is primarily used by programmers as a way to manipulate the content of an XML document. The XML DOM is useful for tasks as diverse as manipulating data from an RSS feed to animating part of an SVG graphic.
Although many XML programmers refer to the XML DOM simply as the DOM, the term DOM can also be used to refer to the HTML Document Object Model, the XML Document Object Model, or both. In this chapter the focus is on the XML DOM.
The purpose of the XML Document Object Model
How the DOM specification was developed at the W3C
Important XML DOM interfaces and objects, such as the Node and Document interfaces
Node
Document
How to add and delete elements and attributes from a DOM and manipulate a DOM tree in other ways
How the XML DOM is used “under the covers” in Microsoft InfoPath 2003
In the last chapter, you learned about the Document Object Model (DOM) and how it can be used to work with your XML documents... more
In the last chapter, you learned about the Document Object Model (DOM) and how it can be used to work with your XML documents. The DOM is great when you work with relatively small documents that can easily fit into memory, but what do you do when you need to read an XML file that is several megabytes or even several gigabytes large? Loading this kind of data into memory can be very slow, and in many cases not possible. Luckily, you have another way to get the data out of an XML document: SAX.
What is SAX?
Where to download SAX and how to set it up
How and when to use the primary SAX interfaces
Because SAX is an application programming interface (API), you need to learn some in-depth programming concepts within this chapter. As in the last chapter, you will learn it step by step, but you need to have some programming experience under your belt. In order to work through the many examples, this chapter explains how to download and install the Java Development Kit (JDK). If you do not plan to program applications for XML, but rather plan to use XML for its design- and document-driven nature, you may want to skip this chapter.
One of the interesting characteristics of the Web is the way that certain ideas seem to arise spontaneously, without any... more
One of the interesting characteristics of the Web is the way that certain ideas seem to arise spontaneously, without any centralized direction. Content syndication technologies definitely fall into this category, and they have emerged as a direct consequence of the linked structure of the Web and general standardization regarding the use of XML.
This chapter focuses on a number of aspects of content syndication, including the RSS and Atom formats and their role in such areas as blogs, news services, and the like. There is no doubt these technologies will play a major role in the next logical leap in the connectedness of the Web, so it’s useful to understand them not just from an XML-format standpoint but also in terms of how they are shaping the future Internet.
Concepts and technologies of content syndication and meta data
A brief look at the history of RSS, Atom, and related languages
What the feed languages have in common and how they differ
How to implement a simple newsreader/aggregator using Python
Examples of XSLT used to generate and display newsfeeds
There is a lot more to RSS, Atom, and content syndication than can be covered in a single chapter, so the aim here is to give you a good grounding in the basic ideas, and then provide a taste of how XML tools such as SAX and XSLT can be used in this rapidly expanding field.
So far, we’ve covered what XML is and how to create well-formed and valid XML documents, and you’ve even seen a couple of... more
So far, we’ve covered what XML is and how to create well-formed and valid XML documents, and you’ve even seen a couple of programmatic interfaces into XML documents in the form of DOM and SAX. We also discussed the fact that XML isn’t really a language on its own; it’s a meta language, to be used in the creation of other languages.
This chapter takes a slightly different turn. Rather than discuss XML itself, it covers an application of XML: web services enable objects on one computer to call and make use of objects on other computers. In other words, web services are a means of performing distributed computing.
What a remote procedure call (RPC) is, and what RPC protocols exist currently
Why web services can provide more flexibility than previous RPC protocols
How XML-RPC works
Why most web services implementations should use HTTP as a transport protocol, and how HTTP works under the hood
How the specifications that surround web services fit together
In the last chapter, you learned about web services and how they work toward enabling disparate systems to communicate. Of... more
In the last chapter, you learned about web services and how they work toward enabling disparate systems to communicate. Of course, if everyone just chose their own formats in which to send messages back and forth, that wouldn’t do much good in the interoperability area, so a standard format is a must. XML-RPC is good for remote procedure calls, but otherwise limited. SOAP overcomes that problem by enabling rich XML documents to be transferred easily between systems, even allowing for the possibility of attachments. Of course, this flexibility means that you need a way to describe your SOAP messages, and that’s where Web Services Description Language (WSDL) comes in. WSDL provides a standard way to describe where and how to make requests to a SOAP-based service.
SOAP originally stood for Simple Object Access Protocol, but as most people found it anything but simple it is now officially a name rather than an acronym, so it doesn't stand for anything.
In this chapter you’ll take it a step further by creating a simple web service using a method called REST (covered in the previous chapter). You’ll expand your horizons by creating a SOAP service and accessing it via SOAP messages, describing it using WSDL so that other developers can make use of it if desired.
In this chapter you’ll learn the following:
Why SOAP can provide more flexibility than previous RPC protocols
How to format SOAP messages
When to use GET versus POST in an HTTP request
GET
POST
What SOAP intermediaries are
How to describe a service using WSDL
The difference between SOAP styles
The term Ajax was first used in early 2005 by Jesse James Garrett as an acronym for Asynchronous JavaScript and XML. The... more
The term Ajax was first used in early 2005 by Jesse James Garrett as an acronym for Asynchronous JavaScript and XML. The term is a little misleading because (a) the technique is not always asynchronous; (b) XML is not necessarily used; and (c) if you’re happy for your code to be Internet Explorer–specific, then you can replace JavaScript with VBScript.
The crux of Ajax, though, is making requests behind the scenes in a web application and incorporating any data returned in the page without reloading the entire HTML. The normal way that this is carried out is by using an HTTP Request controlled by client-side scripting. Data is passed in the request, often as XML, and a response, also commonly XML, is received. The information contained in this response is then incorporated in the page using dynamic HTML.
These techniques have made possible the kind of responsiveness and functionality that previously were only to be found in desktop applications. Two of the most famous Ajax uses both originate from Google. Google Suggest, www.google.com/webhp?complete=1&hl=en, enables a textbox to suggest suitable entries chosen from a drop-down list; and Gmail, mail.google.com/mail/help/intl/en/about.html, is a web e-mail client with almost as much functionality as a traditional desktop application such as Microsoft’s Outlook Express.
www.google.com/webhp?complete=1&hl=en
mail.google.com/mail/help/intl/en/about.html
This chapter first describes previous endeavors to improve the user experience before the Ajax technique was formalized. You will learn the differences between the two main browser camps, IE and Mozilla, and how this led to the use of a cross-browser library to simplify development. You will also see how background requests are passed using the XMLHttpRequest, and how the two main options for data formats, XML and JSON, compare. Two Ajax applications are examined in detail: a simple web service that validates credit card numbers and a more complex AutoSuggest textbox. The chapter finishes with an explanation of how the same source origin policy limits your use of third-party web services, and how to overcome this using a server-side proxy.
This chapter examines Cascading Style Sheets (CSS) as a means of styling XML documents for use on the Web. You may well have... more
This chapter examines Cascading Style Sheets (CSS) as a means of styling XML documents for use on the Web. You may well have already used CSS with HTML or XHTML. Dealing with other XML document types, however, requires some different techniques, which are covered in this chapter.
You’ll see XHTML in the next chapter, but for the time being you only need to know that XHTML documents can be styled like HTML documents. Styling XHTML and HTML documents is so similar that this chapter uses the term “(X)HTML” to represent “HTML or XHTML.”
Even when you remove stylistic markup from an (X)HTML document, the browser still knows how to display elements, such as tables, lists of different levels of headings, and so on. In other XML vocabularies, you won’t even have this most rudimentary help with layout. After all, a <table> element in your XML vocabulary might be used to describe a piece of wood with four legs. Considering that a browser won’t know how any of the elements in your XML vocabulary need to be displayed, you have a lot more work to do when styling XML documents with CSS than (X)HTML documents.
<table>
If you know that your XML documents will be displayed on the Web, some of the points you learn in this chapter might even affect the way in which you write your vocabulary or schema. For example, by the end of the chapter, you will understand why CSS is much better suited to displaying element content than attribute values.
How CSS relies upon a box model for styling documents, whereby the content of each element inhabits a box
How to use CSS to style (X)HTML documents, rather than relying on stylistic markup
How to give your XML documents a visual structure so that they can look like (X)HTML documents with features such as tables, lists, links, and images, even though the browser does not know how to present any of the elements.
Before looking at CSS, however, it is important to reiterate the reasons why you need stylesheets.
This chapter uses Internet Explorer 6 and Firefox 1.5 or later, or other Gecko-based browsers such as Mozilla and Netscape 6 or later. Most features described and demonstrated here are available in recent versions of other browsers such as Opera and Safari.
When people say XHTML is the new HTML, it is not in the sense that fashion pundits might say brown is the new black; it is... more
When people say XHTML is the new HTML, it is not in the sense that fashion pundits might say brown is the new black; it is the W3C’s replacement for HTML. Rather than create HTML 5, the W3C made XHTML, which is akin to Macromedia creating Flash MX instead of Flash 6, or Microsoft releasing Windows XP instead of Windows 2001. XHTML is actually the reformulation of HTML 4 written in XML, so you have a few new rules to learn, the first of which is that it shall be XML-compliant.
The good news is that the elements and attributes available to you in XHTML are almost identical to those in HTML 4 (after all, XHTML 1.0 is a version of HTML 4 written in XML), so you won’t need to learn a new vocabulary in this chapter. There are, however, a few changes related to how construct documents, which is what you will learn in this chapter.
While XML is finding its way into many aspects of programming, data storage, and document authoring, it was primarily designed for use on the Web. It isn’t surprising, therefore, that the W3C wanted to make these changes to HTML (the most widely used language on the Web) to make it an application of XML.
Why do you need to learn a new version of HTML? After all, existing browsers will continue to support HTML, as we know it, for the foreseeable future (and many sites on the Internet may never be upgraded). In fact, this chapter covers several reasons for upgrading old HTML pages, including the following:
It can make your page size smaller and your code clearer to read.
It can make your pages more accessible to readers with disabilities, including search engine crawlers!
Your code can be used with all XML-aware processors (from authoring tools and validators to XSLT, DOM, and SAX processors).
It addresses issues regarding creating web pages so that they can be viewed on all the new devices that can now access the Internet, from phones to fridges, without each type of device requiring its own different language.
Some new browsers and devices are being written to only support XHTML.
As a reformulation of HTML 4 in XML, XHTML doesn’t add new features to HTML. Code conciseness and accessibility can be achieved in HTML. However, the other points mentioned in the preceding list are specific to XML. Furthermore, XHTML is probably the most popular application of XML today; and if you are familiar with HTML, you will be writing XHTML pages in no time at all.
Covered in this chapter are two versions of XHTML: XHTML 1.0 and XHTML 1.1. The W3C is currently working on XHTML 2.0, which is a complete refactoring of XHTML (and HTML). XHTML 2.0 is still a work in progress and is briefly covered at the end of this chapter.
Before you even look at XHTML, however, you should be aware that in HTML 4.1 all stylistic markup (such as the <font> element and bgcolor attribute, which is used to indicate how a document should appear) was marked as deprecated, meaning it would be phased out in future versions of the specifications, so it is essential to address removal of the stylistic markup before starting on XHTML.
<font>
bgcolor
How to keep style and content separate, and the benefits of doing so
The different versions and document types of XHTML
How to write XHTML 1.0 documents
What modularized XHTML is and how it enables you to write pages for many different devices
This chapter assumes you have a basic knowledge of HTML. If you don’t, plenty of free tutorials are available on the Web, including the following:
www.w3.org/MarkUp/Guide/
www.w3schools.com/html/html_intro.asp
www.webreference.com/html/tutorials/
This chapter describes Scalable Vector Graphics (SVG), an extremely versatile 2-D graphics format designed primarily for... more
This chapter describes Scalable Vector Graphics (SVG), an extremely versatile 2-D graphics format designed primarily for the Web. Its specification is defined and maintained by the World Wide Web Consortium (W3C), and it offers an open alternative to proprietary graphics systems.
Here you learn about the core concepts and some of the most commonly used features of SVG, along with corresponding practical code. The SVG specification is brimming with featuresfar too many to describe in a single chapterbut to come to grips with the language, you need to know how to write practical code and have a general idea of the kind of things SVG can do.
This chapter is divided into four sections:
An overview of SVG, including the kind of things it’s good for and what tools are available to you, the developer
A hands-on section that demonstrates some of the basics of SVG in code examples
A simple but complete browser-based SVG application constructed using XHTML and SVG, as well as a script manipulating the XML DOM
A section-by-section summary of the contents of the SVG specification
The information in this chapter is quite densely packed, but once you start playing with SVG yourself, you will discover that not only is it easier to work with than it looks on the printed page, but it’s also a lot of fun.
XForms is an XML-based forms technology specified by the World Wide Web Consortium (W3C). XForms’ initial intent was to replace... more
XForms is an XML-based forms technology specified by the World Wide Web Consortium (W3C). XForms’ initial intent was to replace HTML forms, which are now at least a decade old. The power and flexibility of XForms goes well beyond this initial goal, and XForms is well suited to be used as a general-purpose tool for designing user interfaces for Web applications.
In several earlier chapters, you learned how to manipulate XML using technologies such as XPath, XSLT, XQuery, and the XML DOM, but you have yet to discover how to collect data to form part of an XML-based workflow. XForms is an important tool in the XML developer’s toolbox, because XForms submits data from forms as well-formed XML documents.
Forms are an integral part of day-to-day business activity. Filling in paper forms or electronic forms is almost inescapable for anyone who is an information worker. As XML-based workflows become more prevalent in large enterprises and progressively trickle down into smaller businesses, the advantages of submitting XML data will become more widely appreciated.
XForms isn’t the only XML-based forms tool, and although the main focus of this chapter is XForms, other proprietary solutions to XML-based forms are described briefly toward the end of the chapter.
How XForms improves on existing HTML forms technology
The state of the main XForms implementations
How the XForms model is created, including a discussion and examples of using the xforms:model, xforms:instance, xforms:submission, and xforms:bind elements
xforms:model
xforms:instance
xforms:submission
xforms:bind
How the W3C XML Schema, XPath, and XML namespaces are used in XForms
How to use XForms form controls
Alternatives to XForms
Throughout this book, you have learned how XML can be used to construct and validate documents and for communications between... more
Throughout this book, you have learned how XML can be used to construct and validate documents and for communications between systems, and you now know how to use several important XML display formats. Sometimes it can be difficult seeing how all of these fit together without a real-world business case. This case study demonstrates how you can build an online home loan calculator using a public web service, a .NET web application, JavaScript, and several of the XML technologies described in this book.
Specifically, this chapter describes how to do the following:
Create a web page to enter loan information
Call a web service to calculate the payments using SOAP
Display the results using Ajax (Asynchronous JavaScript and XML) and SVG
Throughout this book, you have learned how XML can be used to construct and validate documents and how it is used for communications... more
Throughout this book, you have learned how XML can be used to construct and validate documents and how it is used for communications between systems. You have also learned how to use several important XML display formats. Sometimes it can be difficult seeing how all of these technologies fit together without a real-world business case. This case study demonstrates how you can build an online home loan calculator using a public web service, a Ruby on Rails web application, JavaScript, and several of the XML technologies you have learned.
In this chapter, you will:
Create a Ruby on Rails application.
Create a web page to enter loan information.
Call a web service to calculate the payments using SOAP.
Display the results using Ajax (Asynchronous JavaScript and XML) and SVG.
This appendix contains some suggested solutions to the exercise questions posed at the end of most of the chapters throughout... more
This appendix contains some suggested solutions to the exercise questions posed at the end of most of the chapters throughout the book.
XPath is a well-established W3C specification that describes a non-XML syntax for selecting a set of nodes from the in-memory... more
XPath is a well-established W3C specification that describes a non-XML syntax for selecting a set of nodes from the in-memory model of an XML document. XPath version 1.0 reached W3C Recommendation status on November 16, 1999. The specification documents for XPath 2.0, which is a subset of XQuery 1.0, were in late Working Draft stage at the time of this writing. XPath, both 1.0 and 2.0, is an essential part of the corresponding XSLT specification. This appendix focuses on XPath 1.0.
An XPath location path contains one or more location steps, separated by forward slashes (/). Each location step has the following form:
/
axis-name::node-test[predicate]*
In plain English, this is an axis name, followed by two colons, a node test, and, finally, zero or more predicates each contained in square brackets. A predicate can contain literal values (for example, 4 or ‘hello’), operators (+, –, =, and so on), and other XPath expressions. XPath also defines a set of functions that can be used in predicates.
hello
An XPath axis defines how to select a part of the model of an XML document, from the perspective of a starting point called the context node. The context node serves as the starting point for selecting the result of an XPath expression. The node test makes a selection from the nodes on the specified axis. In other words, a node test filters the nodes in the specified axis. By adding predicates, it is possible to filter any nodes already selected by selecting a subset of the nodes selected by the axis, and node-test parts of the expression. If the expression in the predicate returns true, the node remains in the selected node set; otherwise, it is removed.
true
This reference lists the XPath axes, node tests, and functions. Each entry includes whether it is implemented in version 1.0 of the specification. At one time, there were significant variations among implementations, with, for example, the Microsoft XML Core Services lacking full XPath 1.0 compliance. The situation has now improved to the point that any XPath implementation is likely to be essentially fully XPath 1.0–compliant. Microsoft Core XML Services (MSXML) versions 3.0 and later have full XPath implementations. Versions of MSXML before version 3.0 are not suitable for XPath 1.0 processing.
Other implementations, such as Xalan and Saxon, essentially fully implement version XPath 1.0.
This appendix provides a reference to the elements and functions that are part of XSLT 1.0. A reference to XPath 1.0 constructs... more
This appendix provides a reference to the elements and functions that are part of XSLT 1.0. A reference to XPath 1.0 constructs, including functions that can also be used with XSLT, is in Appendix C.
The XSLT 1.0 specification became a W3C Recommendation on November 16, 1999. As this book goes to press, XSLT 2.0 has just been awarded W3C Recommendation status. XSLT 2.0 and XPath 2.0 go hand-in-hand, are inseparable, and have to be studied together. The development pace for both will be the same. In addition, be aware that XSLT, XPath, and XQuery are so dependent upon one another that you need to have all three skill sets or you will have serious problems. The good news is that they are becoming increasingly similar, meaning that once you master one you can master them all quickly.
XSLT 1.0 processors may or may not come with a description of the conformance to the XSLT 1.0 specification. However, most XSLT processors can be assumed to be close to 100 percent conformant to the W3C XSLT 1.0 specification. Some experimental XSLT processors, such as recent versions of Saxon, include a conformant XSLT 1.0 implementation, which was used in Chapter 8, and an experimental XSLT 2.0 processor. This new emerging XSLT 2.0 processor only works with XPath 2.0. You cannot mix XSLT 2.0 and XPath 1.0, nor the other way around. This is not to say that XSLT 1.0 features are obsolete, but that XSLT 2.0 will be different in philosophy, syntax, and construct.
Both the attributes on XSLT 1.0 elements and the parameters of XSLT 1.0 functions can be of several types. The end of this appendix contains a list of the types used in the elements and functions of XSLT.
For more information on the meaning of the element or function types, see the “Types” table at the end of this appendix.
XSLT stands for XSL Transformations and plays a major role in XSL. XSLT can transform an XML document into another XML document type such as HTML and XHTML. Normally, XSLT does this by transforming each XML element into an (X)HTML element.
With XSLT you can add/remove elements and attributes to/from the output file. You can also rearrange and sort elements, perform tests, and make decisions about which elements to hide and display, and a lot more. Think of XSLT as a transformation tool that when paired with XPath functionality can be used to process information selectively and quickly.
In the transformation process, XSLT uses XPath to define parts of the source document that should match one or more predefined templates, such as attribute templates. Attribute templates can be predefined to standardize and simplify simple string substitutions used to process files known as result documents. When a match is found, XSLT will transform the matching part of the source document into the result document reformatted in the new syntax.
This appendix lists the interfaces in the Document Object Model (DOM) Level 3. Examples showing how to use some of these... more
This appendix lists the interfaces in the Document Object Model (DOM) Level 3. Examples showing how to use some of these interfaces appear in Chapter 11.
Unfortunately, the DOM Working Group defines too many “modules” for DOM functionality to be covered in this one appendix. In fact, at the time of writing, the W3C’s site listed the following seven Technical Reports for different types of DOM activities:
The Core interfaces, which are the base set of interfaces used for working with HTML and XML documents
The Load and Save interfaces, which are used to load XML documents into a DOM (from a file, URI, stream, etc.) or save an XML document from a DOM (to a file, URI, stream, etc.)
The Validation interfaces, which are used to ensure that an XML document is valid, per its schema document(s)
The XPath interfaces, for accessing a DOM tree using XPath syntax
The Views and Formatting interfaces, which can be used to dynamically access and modify a document’s structure, style, and contents
The Events interfaces, which allow for event handlers
The Abstract Schemas interfaces, which allow an interface to schema documents (DTD and XML Schema)
In addition, there is another Technical. Report on “DOM Requirements,” which doesn’t specifically list interfaces.
At the time of writing, only the Core, Load and Save, and Validation modules were full W3C Recommendations, so these are the modules covered in this appendix. Luckily, these are the ones that you are most likely to access in day-to-day work with the DOM.
This appendix provides a handy guide to the DOM interfaces, but if you’d like further information, you can always go to the W3C’s website to view the actual recommendations:
Core: http://www.w3.org/TR/DOM-Level-3-Core
http://www.w3.org/TR/DOM-Level-3-Core
Load and Save: http://www.w3.org/TR/DOM-Level-3-LS
http://www.w3.org/TR/DOM-Level-3-LS
Validation: http://www.w3.org/TR/DOM-Level-3-Val
http://www.w3.org/TR/DOM-Level-3-Val
The interfaces are illustrated in the figures in the following section.
This appendix provides a full listing of all elements within the XML Schema Structures Recommendation... more
This appendix provides a full listing of all elements within the XML Schema Structures Recommendation (found at http://www.w3.org/TR/xmlschema-1/). The elements appear in alphabetical order. Each element is described with examples and a table detailing all the attributes used in the element. When attributes are required, it is noted in the attribute listings.
http://www.w3.org/TR/xmlschema-1/
The end of this appendix presents a table of the attributes in the XML Schema Instance namespace that can be used in instance documents.
This appendix provides a quick reference to the W3C Recommendation for XML Schemas, Part 2: Datatypes. Datatypes were separated... more
This appendix provides a quick reference to the W3C Recommendation for XML Schemas, Part 2: Datatypes. Datatypes were separated into a specification in their own right so that XML Schemas as well as other XML-related technologies (for example, RELAX NG) can use them.
The XML Schema defines several datatypes that can be used to validate the content of attributes and text-only elements. These datatypes enable you to specify that the content must be formatted as a date, a Boolean, a floating-point number, and so on. The second part of the XML Schema Recommendation defines two sorts of datatype:
Built-in types, which are available to all XML Schema authors, and should be implemented by a conforming processor
User-derived types, which are defined in individual schema documents, and are particular to that schema (although it is possible to import and reuse these definitions in other XML Schemas). These types are based on the existing built-in types.
Built-in types include two subgroups:
Built-in primitive types, which are types in their own right. They are not defined in terms of other datatypes. Primitive types are also known as base types because they are the basis from which all other types are built.
Built-in derived types, which are built from definitions of other primitive and derived datatypes
The first part of this appendix provides a quick overview of all the XML built-in datatypes, both primitive and derived. The second part provides details about all of the constraining facets, or characteristics, of these datatypes. Facets can be used to restrict the allowed set of values for a datatype. Also provided in this appendix are tables that illustrate which of these constraining facets can be applied to which datatype.
This appendix contains the specification of the SAX interface, version 2.0.2, some of which is explained in Chapter 12. It... more
This appendix contains the specification of the SAX interface, version 2.0.2, some of which is explained in Chapter 12. It is taken largely verbatim from the definitive specification to be found at www.saxproject.org, with editorial comments added in italics. The classes and interfaces are described in alphabetical order and include the primary SAX interfaces and classes and SAX extensions. Deprecated classes and helper classes that are distributed with SAX 2.0.2 are not covered.
www.saxproject.org
The SAX specification is in the public domain. (See the website mentioned previously for a statement of policy on copyright.) Essentially, the policy says do what you like with it, copy it as you wish, but no one accepts any liability for errors or omissions.
SAX 2.0.2 contains complete namespace support, which is available by default from any XMLReader object. An XML reader can also optionally supply raw XML 1.0 names. An XML reader is fully configurable: It is possible to attempt to query or change the current value of any feature or property. Features and properties are identified by fully qualified URIs, and parties are free to invent their own names for new extensions.
XMLReader
The ContentHandler and Attributes interfaces are similar to the deprecated DocumentHandler and AttributeList interfaces, but they add support for namespace-related information. ContentHandler also adds a callback for skipped entities, and the Attributes interface adds the capability to look up an attribute’s index by name.
ContentHandler
Attributes
DocumentHandler
AttributeList
The following interfaces were included in SAX 1.0 but have been deprecated:
org.xml.sax.Parser
org.xml.sax.DocumentHandler
org.xml.sax.AttributeList
org.xml.sax.HandlerBase
These interfaces are not covered in this appendix, as their use is not widespread.
Purchase Before purchasing this product, please be sure you have met all software and system requirements, and that you understand any limits placed upon its use.
Return Policy Wrox Chapters on Demand are non-returnable and non-refundable.
Reader Software Wrox Chapters on Demand are offered as PDFs, and they must be viewed using the Adobe Reader. If you do not have the Reader installed, it can be downloaded for free at Adobe.com.
Test Download As Wrox Chapters on Demand purchases are non-returnable, it is advisable that you test your system and software configurations with a free sample download before you place an order.
Usage Rights for a Wrox Chapter on Demand File Any Wrox Chapter on Demand product you purchase from this site will come with certain restrictions that allow Wiley to protect the copyrights of its products. After you purchase and download this title, you:
If you have any questions about these restrictions, you may contact Customer Care at (877) 762-2974 (8 a.m. - 5 p.m. EST, Monday - Friday). If you have any issues related to Technical Support, please contact us at 800-762-2974 (United States only) or 317-572-3994 (International) 8 a.m. - 8 p.m. EST, Monday - Friday).