Back to description
This chapter is designed to put XSLT in context. It’s about the purpose of XSLT and the task it was designed to perform.... more
This chapter is designed to put XSLT in context. It’s about the purpose of XSLT and the task it was designed to perform. It’s about what kind of language it is, how it came to be that way, and how it has changed in version 2.0; and it’s about how XSLT fits in with all the other technologies that you are likely to use in a typical Web-based application (including, of course, XPath, which forms a vital component of XSLT). I won’t be saying very much in this chapter about what an XSLT stylesheet actually looks like or how it works: that will come later, in Chapters 2 and 3.
The chapter starts by describing the task that XSLT is designed to performtransformationand why there is the need to transform XML documents. I’ll then present a trivial example of a transformation in order to explain what this means in practice.
Next, I discuss the relationship of XSLT to other standards in the growing XML family, to put its function into context and explain how it complements the other standards.
I’ll describe what kind of language XSLT is, and delve a little into the history of how it came to be like that. If you’re impatient you may want to skip the history and get on with using the language, but sooner or later you will ask “why on earth did they design it like that?” and at that stage I hope you will go back and read about the process by which XSLT came into being.
... less
This chapter takes a bird’s-eye view of what an XSLT processor does. We start by looking at a system overview: what are the... more
This chapter takes a bird’s-eye view of what an XSLT processor does. We start by looking at a system overview: what are the inputs and outputs of the processor?
Then we look in some detail at the data model, in particular the structure of the tree representation of XML documents. An important message here is that XSLT transformations do not operate on XML documents as text; they operate on the abstract tree-like information structure represented by the markup.
Having established the data model, I will describe the processing sequence that occurs when a source document and a stylesheet are brought together. XSLT is not a conventional procedural language; it consists of a collection of template rules defining output that is produced when particular patterns are matched in the input. As seen in Chapter 1, this rule-based processing structure is one of the distinguishing features of the XSLT language.
Finally, we look at the way in which variables and expressions can be used in an XSLT stylesheet, and also look at the various data types available.
This chapter describes the overall structure of a stylesheet. In the previous chapter we looked at the processing model for... more
This chapter describes the overall structure of a stylesheet. In the previous chapter we looked at the processing model for XSLT and the data model for its source and result trees. In this chapter we will look in more detail at the different kinds of construct found in a stylesheet such as declarations and instructions, literal result elements, and attribute value templates.
Some of the concepts explained in this chapter are tricky; they are areas that often cause confusion, which is why I have tried to explain them in some detail. However, it’s not necessary to master everything in this chapter before you can write your first stylesheetso use it as a reference, coming back to topics as and when you need to understand them more deeply.
The topics covered in this chapter are as follows:
Stylesheet modules. We will discuss how a stylesheet program can be made up of one or more stylesheet modules, linked together with <xsl:import> and <xsl:include> elements.
<xsl:import>
<xsl:include>
The <xsl:stylesheet> (or <xsl:transform> element). This is the outermost element of most stylesheet modules, and it defines various attributes that control how other constructs in the module behave.
<xsl:stylesheet>
<xsl:transform>
The <?xml-stylesheet?> processing instruction. This links a source document to its associated stylesheet, and allows stylesheets to be embedded directly in the source document whose style they define.
<?xml-stylesheet?>
A brief description of the declarations found in the stylesheet, that is, the immediate children of the <xsl:stylesheet> or <xsl:transform> element. The full specifications are in Chapter 6.
A brief description of each instruction that can be used in a stylesheet. In the previous chapter, I introduced the idea of a sequence constructor as a sequence of instructions that can be evaluated to produce a sequence of items, which will usually be nodes to be written to the result tree. This section provides a list of the instructions that can be used, with a quick summary of the function of each one. Full specifications of each instruction can be found in Chapter 6.
Simplified stylesheets, in which the <xsl:stylesheet> and <xsl:template match="/"> elements are omitted, to make an XSLT stylesheet look more like the simple template languages that some users may be familiar with.
<xsl:template
match="/">
Attribute value templates. These define variable attributes not only of literal result elements but of certain XSLT elements as well.
Facilities allowing the specification to be extended, both by vendors and by W3C itself, without adversely affecting the portability of stylesheets.
Handling of whitespace in the source document, in the stylesheet itself, and in the result tree.
One of the most important innovations in XSLT 2.0 is that stylesheets can take advantage of the schemas you have defined... more
One of the most important innovations in XSLT 2.0 is that stylesheets can take advantage of the schemas you have defined for your input and output documents. This chapter explores how this works.
This feature is an optional part of XSLT 2.0, in two significant ways:
Firstly, an XSLT 2.0 processor isn’t required to implement this part of the standard. A processor that offers schema support is called a schema-aware processor; one that does not is referred to as a basic processor.
Secondly, even if the XSLT 2.0 processor you are using is a schema-aware processor, you can still process input documents, and produce output documents, for which there is no schema available.
There is no space in this book for a complete description of XML Schema. If you want to start writing schemas, I would recommend you read XML Schema by Eric van der Vlist (O’Reilly & Associates, 2002) or Definitive XML Schema by Priscilla Walmsley (Prentice Hall, 2002). XML Schema is a large and complicated specification, certainly as large as XSLT itself. However, it’s possible that you are not writing your own schemas, but writing stylesheets designed to work with a schema that someone else has already written. If this is the case, I hope you will find the short overview of XML Schema in this chapter a useful introduction.
This chapter looks in some detail at the XPath type system; that is, the types of the values that can be manipulated by XPath... more
This chapter looks in some detail at the XPath type system; that is, the types of the values that can be manipulated by XPath expressions and XSLT instructions.
XPath is an expression language. Every expression takes one or more values as its inputs, and produces a value as its output. The purpose of this chapter is to explain exactly what these values can be.
Chapter 2 presented the XDM tree model with its seven node kindsthat’s part of the picture, because XPath expressions will often be handling nodes in a tree. The other half of the picture is concerned with atomic values (strings, numbers, booleans, and the like), and it’s these values that we'll be studying in this chapter.
One of the things an expression language tries to achieve is that wherever you can use a value, you can replace it with an expression that is evaluated to produce that value. So if «2+2» is a valid expression, then «(6-4)+(1+1)» should also be a valid expression. This property is called composability: expressions can be used anywhere that values are permitted. One of the important features that make a language composable is that the possible results of an expression are the same as the possible inputs. This feature is called closure: every expression produces a result that is in the same space of possible values as the space from which the inputs are drawn.
«2+2»
«(6-4)+(1+1)»
The role of the data model is to describe this space of possible values, and the role of the type system is to define the rules for manipulating these values.
This chapter provides an alphabetical list of reference entries, one for each of the XSLT elements. Each entry gives:... more
This chapter provides an alphabetical list of reference entries, one for each of the XSLT elements. Each entry gives:
A short description of the purpose of the element
Changes in 2.0: A quick summary of changes to this element since XSLT 1.0
Format: A pro forma summary of the format, defining where the element may appear in the stylesheet, what its permitted attributes are, and what its content (child elements) may be
Effect: A definition of the formal rules defining how this element behaves
Usage: A section giving usage advice on how to exploit this XSLT element
Examples: Coding examples of the element, showing the context in which it might be used. (where appropriate, the Usage and Examples sections are merged into one)
See also: Cross-references to other related constructs
The Format section for each element includes a syntax skeleton designed to provide a quick reminder of the names and types of the attributes and any constraints on the context. The format of this is designed to be intuitive: it only gives a summary of the rules, because you will find these in full in the Position, Attributes, and Content sections that follow.
There are a number of specialized terms used in this chapter, and it is worth becoming familiar with them before you get in too deeply. There are fuller explanations in Chapters 2 and 3, and the descriptions in the following table are really intended just as a quick memory-jogger.
This chapter defines some fundamental features of the XPath language. The first half of the chapter describes the basic syntactic... more
This chapter defines some fundamental features of the XPath language. The first half of the chapter describes the basic syntactic and lexical conventions of the language, and the second half describes the important notion of context: this establishes the way in which XPath expressions interact with the environment in which they are used, which for our purposes primarily means the containing XSLT stylesheet.
The complete grammar of the language is summarized in Appendix A.
This chapter defines the simple operators available for use in XPath expressions. This is inevitably a rather arbitrary category... more
This chapter defines the simple operators available for use in XPath expressions. This is inevitably a rather arbitrary category, but these operators seem to have enough in common to justify putting them together in one chapter. All these operators return single items (as distinct from sequences)in fact, all of them except the arithmetic operators in the first section return a boolean result.
More specifically, this chapter describes the following families of operators:
Arithmetic operators, «+», «-», «*», «div», and «mod»
«+»
«-»
«*»
«div»
«mod»
Value comparison operators «eq», «ne», «lt», «le», «gt», «ge»
«eq»
«ne»
«lt»
«le»
«gt»
«ge»
General comparison operators «=», «!=», «<», «<=», «>», «>=»
«=»
«!=»
«<»
«<=»
«>»
«>=»
Node comparison operators «<<», «is», and «>>»
«<<»
«is»
«>>»
Boolean operators «and» and «or»
«and»
«or»
Many of these operators behave in much the same way as similar operators in other languages. There are some surprises, though, because of the way XPath handles sequences, and because of the way it mixes typed and untyped data. So don’t skip this chapter just because you imagine that everything about these operators can be guessed.
This chapter defines the syntax and meaning of path expressions. Path expressions are the most distinctive feature of the... more
This chapter defines the syntax and meaning of path expressions. Path expressions are the most distinctive feature of the XPath language, the construct that gives the language its name. The chapter also describes other constructs in the language that are closely associated with path expressions, in particular steps and axes and the «union», «intersect», and «except» operators.
«union»
«intersect»
«except»
Path expressions are used to select nodes in a tree, by means of a series of steps. Each step takes as its starting point a node, and from this starting point, selects other nodes.
Each step is defined in terms of:
An axis, which defines the relationship to be followed in the tree (for example, it can select child nodes, ancestor nodes, or attributes)
A node test, which defines what kind of nodes are required, and can also specify the name or schema-defined type of the nodes
Zero or more predicates, which provide the ability to filter the nodes according to arbitrary selection criteria
Because they are closely associated with processing the results of path expressions, this chapter also describes the operators used to combine two sets of nodes by taking their union, intersection, or difference.
Although I’ve chosen Path Expressions as the title for this chapter, the term is actually a slippery one. Because of the way W3C defines the XPath grammar, all sorts of unlikely constructs such as «2» or «count($x)» are technically path expressions. The things I will actually cover in this chapter are:
«2»
«count($x)»
The binary «/» operator as applied to nodes. This is used in expressions like «$chap/title». There's another use of the «/» operator that applies to atomic values, in what I call a simple mapping expression, and I will cover that in Chapter 10.
«/»
«$chap/title».
Axis steps, for example «ancestor::x» or «following-sibling::y[1]» , including abbreviated axis steps such as «x» (short for «child::x») and «@y» (short for «attribute::y»). Axis steps are expressions in their own right, but they are often used before or after the «/» operator.
«ancestor::x»
«following-sibling::y[1]»
«x»
«child::x»
«@y»
«attribute::y»
Variants on the «/» operator that can be used to write abbreviated path expressions, notably «/» as a freestanding expression, «/» at the start of a path expression, and the «//» pseudo-operator.
«//»
What I call the Venn operators: union, intersect, and except. These are often used to combine the results of several path expressions, or to form a step of a path expression.
union
intersect
except
One of the most notable innovations in XPath 2.0 is the ability to construct and manipulate sequences. This chapter is devoted... more
One of the most notable innovations in XPath 2.0 is the ability to construct and manipulate sequences. This chapter is devoted to an explanation of the constructs in the language that help achieve this.
Sequences can consist either of nodes, or of atomic values, or of a mixture of the two. Sequences containing nodes only are a generalization of the node-sets offered by XPath 1.0. In the previous chapter we looked at the XPath 2.0 operators for manipulating sets of nodes, in particular, path expressions, and the operators «union», «intersect», and «except».
In this chapter we look at constructs that can manipulate any sequence, whether it contains nodes, atomic values, or both. Specifically, the chapter covers the following constructs:
Sequence concatenation operator: «,»
«,»
Numeric range operator: «to»
«to»
Filter expressions: «a[b]»
«a[b]»
Mapping expressions: «for»
«for»
Simple mapping expressions: «/» applied to atomic values
Quantified expressions: «some» and «every»
«some»
«every»
First, some general remarks about sequences.
Sequences (unlike nodes) do not have any concept of identity. Given two values that are both sequences, you can ask (in various ways) whether they have the same contents, but you cannot ask whether they are the same sequence.
Sequences are immutable. This is part of what it means for a language to be free of side effects. You can write expressions that take sequences as input and produce new sequences as output, but you can never modify an existing sequence in place.
Sequences cannot be nested. If you want to construct trees, build them as XML trees using nodes rather than atomic values.
A single item is a sequence of length one, so any operation that applies to sequences also applies to single items.
Sequences do not have any kind of type label that is separate from the type labels attached to the items in the sequence. As we will see in Chapter 11, you can ask whether a sequence is an instance of a particular sequence type, but the question can be answered simply by looking at the number of items in the sequence, and at the type labels attached to each item. It follows that there is no such thing as (say) an “empty sequence of integers” as distinct from an “empty sequence of strings”. If the sequence has no items in it, then it also carries no type label. This has some real practical consequences, for example, the sum() function, when applied to an expression that can only ever return a sequence of xs:duration values, will return the integer 0 (not the zero-length duration) when the sequence is empty, because there is no way at runtime of knowing that if the sequence hadn’t been empty, its items would have been durations.
sum()
xs:duration
Functions and operators that attach position numbers to the items in a sequence always identify the first item as number 1 (one), not zero. (Although programming with a base of zero tends to be more convenient, Joe Public has not yet been educated into thinking of the first paragraph in a chapter as paragraph zero, and the numbering convention was chosen with this in mind.)
This chapter covers the language constructs that handle general sequences, but there are also a number of useful functions available for manipulating sequences, and these are described in Chapter 13. Relevant functions include: count(), deep-equal(), distinct-values(), empty(), exists(), index-of(), insert-before(), remove(), subsequence(), and unordered().
count()
deep-equal()
distinct-values()
empty()
exists()
index-of()
insert-before()
remove()
subsequence()
unordered()
This chapter is concerned with XPath expressions that involve types. This includes operations to convert a value of one type... more
This chapter is concerned with XPath expressions that involve types. This includes operations to convert a value of one type to a value of another type (which is called casting), and operations to test whether a value belongs to a particular type.
The type system for XPath was fully explained in Chapter 5. Recall in particular that there are two separate but related sets of types we are concerned with:
Every value in XPath (that is, the result of every expression) is an instance of a sequence type. This reflects the fact that every XPath value is a sequence. A sequence type in general defines an item type that each of the items in the sequence must conform to, and a cardinality that constrains the number of items in the sequence. The items may be either nodes or atomic values, so item types divide into those that permit nodes and those that permit atomic values. There are also two special item types, the type item(), which permits anything, and the type empty-sequence(), which permits nothing.
item()
empty-sequence()
Every element and attribute node conforms to a type definition contained in a schema, or a built-in type definition that is implicit in every schema. To distinguish these clearly from sequence types, I will refer to these types as schema types. A schema type may be either a simple type or (for elements only) a complex type. A simple type may be either a list type, a union type, or an atomic type. A type definition constrains the contents of a node (that is, the value of an attribute, or the attributes and children of an element); it does not constrain the name of the node.
We need to use careful language to avoid confusing these two views of the type system. When we have an XPath value that is a node, we will speak of the node being an instance of a sequence typefor example, every element is an instance of the sequence type element(). At the same time, the node is annotated with a schema typefor example, an element node may be annotated as an mf:invoice (which will be the name of a complex type defined in some schema).
element()
mf:invoice
These two sets of types (sequence types and schema types) overlap: in particular, atomic types such as xs:integer belong to both sets. However, list types, union types, and complex types are never used as item types or sequence types; they are used only to annotate nodes. Equally, item types such as comment()are only used in sequence types; they are never used to annotate nodes. This idea is illustrated in Figure 11-1.
xs:integer
comment()
The first part of this chapter is concerned with conversion of values from one type to another. These types are always atomic types; no conversions are defined for any types other than atomic types. The process of atomization, which extracts the typed value of a node, could be regarded as a conversion, but we won’t treat it as such for our present purposes.
Atomic types can be referred to by the name given to them in the schema. A schema can define anonymous atomic types, but because these have no name, they can’t be referenced in an XPath expression. Named atomic types are always defined by a top-level <xs:simpleType> element in a schema (more specifically, by an <xs:simpleType> element that is a child of either an <xs:schema> element or an <xs:redefine> element), and these elements always have a name attribute.
<xs:simpleType>
<xs:schema>
<xs:redefine>
name
The final part of this chapter deals with two operators («instance of» and «treat as») that take as their “operands” an arbitrary XPath value (that is, a sequence), and a sequence type. (I’ve written “operands” in quotes, because a true operand is always a value, and in the XPath view of the world, types are not values). These two constructs require a special syntax for describing sequence types. For example, «attribute(*, xs:date)?» describes a sequence type whose item type matches any attribute node annotated as an xs:date, and whose cardinality allows the sequence to contain zero or one values. I will refer to such a construct as a sequence type descriptor, because the construct seems to need a name, and the XPath specification doesn’t give it one.
«instance
of»
«treat
as»
«attribute(*,
xs:date)?»
xs:date
A pattern is used in XSLT to define a condition that a node must satisfy in order to be selected. The most common use of... more
A pattern is used in XSLT to define a condition that a node must satisfy in order to be selected. The most common use of patterns is in the match attribute of <xsl:template>, where the pattern says which nodes the template rule applies to. For example, <xsl:template match="abstract"> introduces a template rule that matches every <abstract> element. This chapter defines the syntax and meaning of XSLT patterns.
match
<xsl:template>
<xsl:template match="abstract">
<abstract>
Patterns (sometimes called match patterns) are used in just six places in an XSLT stylesheet:
In the match attribute of <xsl:template>, to define the nodes in a source document to which a template applies
In the match attribute of <xsl:key>, to define the nodes in a source document to which a key definition applies
<xsl:key>
In the count and from attributes of <xsl:number>, to define which nodes are counted when generating numbers
count
from
<xsl:number>
In the group-starting-with and group-ending-with attributes of <xsl:for-each-group>, to identify a node that acts as the initial or final node in a group of related nodes
group-starting-with
group-ending-with
<xsl:for-each-group>
This chapter describes all the standard functions included in the XSLT 2.0 and XPath 2.0 specifications for use in XPath... more
This chapter describes all the standard functions included in the XSLT 2.0 and XPath 2.0 specifications for use in XPath expressions. Most of these functions are defined in the W3C specification XPath 2.0 and XQuery 1.0 Functions and Operators, and these should be available in all XPath 2.0 implementations. Others, marked as XSLT-only, are defined in the XSLT 2.0 specification, and are available only in XPath expressions used within an XSLT stylesheet.
For each function, I give its name, a brief description of its purpose, a list of the arguments it expects and the value it returns, the formal rules defining what the function does, and finally usage advice and examples.
These are not the only functions you can call from an XPath expression:
So-called constructor functions are available, corresponding to built-in and user-defined atomic types. For example, there is a function called xs:float() to create values of type xs:float, xs:date() to create values of type xs:date, and so on. These functions are also available for user-defined atomic types. They are described in Chapter 11.
xs:float()
xs:float
xs:date()
User-defined functions can be created using the XSLT <xsl:function> declaration; these functions are available for calling from XPath expressions in the stylesheet.
<xsl:function>
Vendor-defined functions may be available. These will be in a namespace controlled by the vendor of the particular product.
It may be possible to call functions written in external languages such as Java, JavaScript, or C#. See Chapter 16 for details.
When XPath is used within another host language, additional functions may be defined. For example, the XForms specification uses XPath and defines a number of XForms-specific functions for use in that environment.
The syntax of a function call is described in Chapter 7. This defines where a function call can be used in an expression, and where it can’t. You can use a function call anywhere that an expression or value can be used, provided that the type of value it returns is appropriate to the context where it used. (Unlike XPath 1.0, this includes the ability to use a function call as a step in a path expression.) Within a function call, the values supplied as arguments can be any XPath expression, subject only to the rules on types (for example, some functions require an argument that is a sequence of nodes). So a function call such as «count(..)», though it looks strange, is perfectly legal: «..» is a valid XPath expression that returns the parent of the context node (it’s described in Chapter 9, on page 227).
«count(..)»
«..»
I’ve arranged the functions in alphabetical order (combining the XPath-defined and XSLT-defined functions into a single sequence), so you can find a function quickly if you know what you’re looking for. However, in case you only know the general area you are interested in, you may find the classification that follows in the section Functions by Category useful. This is followed by a section called Notation, which describes the notation used for function specifications in this chapter. The rest of the chapter is taken up with the functions themselves, in alphabetical order.
This chapter defines the regular expression syntax accepted by the XPath functions... more
This chapter defines the regular expression syntax accepted by the XPath functions matches(), replace(), and tokenize(), which were described in the previous chapter, as well as the <xsl:analyze-string> instruction described in Chapter 6.
matches()
replace()
tokenize()
<xsl:analyze-string>
This regular expression syntax is based on the definition in XML Schema, which in turn is based on the definition in the Perl language, which is generally taken as the definitive reference for regular expressions. However, all dialects of regular expression syntax have minor variations. Within Perl itself there are features that are deprecated, there are features that differ between Perl versions, and there are features that don’t apply when all characters are Unicode.
XML Schema defines a subset of the Perl regular expression syntax; it chose this subset based on the requirements of a language that only does validation (that is, testing whether or not a string matches the pattern) and that only deals with Unicode strings. The requirements of the matches() function in XPath are similar, but XPath also uses regular expressions for tokenizing strings and for replacing substrings. These are more complex requirements, so some of Perl’s regular expression constructs that XML Schema left out have been added back in for XPath.
In the grammar productions in this chapter, as elsewhere in the book, I generally enclose characters of the target language (that is, the regex language) in chevrons, for example «|». I have avoided using the more concise notation «[abcd]» because I think it is confusing to use regular expressions when defining regular expressions. If a character is not enclosed in chevrons, then it is either the name of another non-terminal symbol in the grammar, or a symbol that has a special meaning in the grammar.
«|»
«[abcd]»
The description of the syntax of regular expressions closely follows the description given in the XML Schema Recommendation. You can find this in Appendix F of Schema Part 2. The second edition corrects numerous errors in the original. The latest version of the Recommendation can be found at http://www.w3.org/TR/xmlschema-2.
http://www.w3.org/TR/xmlschema-2
Remember that the syntax rules given here apply to the regular expression after it has been preprocessed by the host language.
If a regular expression is used within an XML document (for example, an XSLT stylesheet), then special characters such as «&» must be escaped using XML entity or character references such as «&». If it appears within an XSLT attribute value template (for example, in the regex attribute of <xsl:analyze-string>), then curly braces must be doubled. If it appears within an XPath string literal, then any apostrophe or quotation mark that matches the string delimiters must be doubled.
«&»
«&»
regex
On the other hand, if your XPath expression is written as a string literal within a host language such as Java or C#, then a backslash will need to be written as «\\» (which means that a regular expression to match a single backslash character becomes «\\\\»).
«\\»
«\\\\»
Serialization in an XSLT context means the process of taking a result tree (the output of a transformation) and converting... more
Serialization in an XSLT context means the process of taking a result tree (the output of a transformation) and converting it into lexical XML, usually as a file in filestore. XSLT also allows serialization into other formats, including HTML and text files.
As mentioned in Chapter 2, although serialization is not part of the core function of an XSLT processor, the language provides constructs such as <xsl:output> that enable you to control the process from within a stylesheet. Many products may also allow you to invoke the serializer as a separate component. With XSLT 2.0, the specification of serialization has been moved into a separate W3C Recommendation, to allow reuse of the facilities from within other XML processing languages such as XQuery and XProc. You can find the W3C specification at http://www.w3.org/TR/xslt-xquery-serialization/.
<xsl:output>
http://www.w3.org/TR/xslt-xquery-serialization/
Serialization is controlled by a set of parameters, each of which has a name and a value. The most important parameter is «method», which takes one of the values «xml», «html», «xhtml», or «text». This determines which serialization method is used (user-defined or vendor-defined serialization methods are also allowed, but are outside the scope of this book). When serialization is invoked from XSLT, the serialization parameters are generally controlled using the attributes of the <xsl:output> or <xsl:result-document> instructions described in Chapter 6. It is often possible, however, to set further parameters from the invoking application, or as options on the command line.
«method»
«xml»
«html»
«xhtml»
«text»
or
<xsl:result-document>
In this chapter, we will start by examining each of the four output methods in turn: XML, HTML, XHTML, and TEXT. Then we’ll look at other serialization capabilities in the XSLT specification, notably character maps and disable-output-escaping.
disable-output-escaping
Details of the syntax of elements such as <xsl:output>, <xsl:result-document>, and <xsl:character-map> are found in the appropriate alphabetical sections in Chapter 6.
<xsl:character-map>
Previous chapters have discussed standard features of the XSLT language. This chapter discusses what happens when you need... more
Previous chapters have discussed standard features of the XSLT language. This chapter discusses what happens when you need to stray beyond the XSLT 2.0 language specification. It’s concerned with questions such as:
What extensions are vendors allowed to provide?
How much are implementations allowed to vary from each other?
How can you write your own extensions?
How can you write stylesheets that will run on more than one vendor’s XSLT processor?
There is some interesting history here. XSLT 1.0 allowed stylesheets to call user-written extension functions but provided no standard way of writing them. The draft XSLT 1.1 specification defined a general mechanism for creating extension functions written in any language and then defined detailed interfaces for Java and JavaScript (or ECMAScript, to give it its vendor-neutral name). This specification was published as a working draft but was subsequently withdrawn. There were a number of reasons for this, one of which was simply that events were overtaken by the more ambitious XSLT 2.0 initiative. But part of the reason was that the proposals for standardizing extension function interfaces attracted heavy public criticism (see http://xml.coverpages.org/withdraw-xslScript.html). It’s difficult in retrospect to summarize the arguments that were waged against the idea, but they probably fell into three categories: some people thought extension functions were a bad idea in principle and should not be encouraged, some people disapproved of singling out two languages (Java and JavaScript) for special treatment, and some people felt that the W3C shouldn’t be putting language bindings into the core XSLT specification, the job should be done in separate specifications preferably produced by a different organization.
http://xml.coverpages.org/withdraw-xslScript.html
The result of this minor furor is that there is no defined interface for writing extension functions, either in XSLT 1.0 or in XSLT 2.0. However, conventions have emerged at least for XSLT 1.0 (the draft 1.1 specification was influenced by these conventions, and in turn exerted its own influence on the products, despite being abandoned), and it is worth giving these some space.
At the time of writing this edition, only a limited number of XSLT 2.0 processors are available, and it is difficult to see trends emerging as to what capabilities vendors will choose to provide. However, there’s no reason to believe that this will be significantly different from the capabilities often found in XSLT 1.0 processors. Some of the examples in this chapter therefore relate to XSLT 1.0 processors such as MSXML from Microsoft and Xalan-J from Apache.
This chapter looks at four common design patterns for XSLT stylesheets.
The concept of design patterns was introduced by Erich... more
The concept of design patterns was introduced by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides in their classic book Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley Publishing, 1995). Their idea was that there was a repertoire of techniques that were useful again and again. They presented 23 different design patterns for object-oriented programming, claiming not that this was a complete list but that the vast majority of programs written by experienced designers fell into one or more of these patterns.
For XSLT stylesheets, the vast majority of stylesheets I have seen fall into one of four design patterns. These are as follows:
Fill-in-the-blanks stylesheets
Navigational stylesheets
Rule-based stylesheets
Computational stylesheets
Again, this doesn’t mean that these are the only ways you can write stylesheets, nor does it mean that any stylesheet you write must follow one of these four patterns to the exclusion of the other three. It just means that a great many stylesheets actually written by experienced people follow one of these four patterns, and if you become familiar with these patterns, you will have a good repertoire of techniques that you can apply to solving any given problem.
I describe the first three design patterns rather briefly, because they are not really very difficult. The fourth, the computational design pattern, is explored in much greater depthnot because it is encountered more often, but because it requires a different way of thinking about algorithms than you use with conventional procedural programming languages.
This is the first of a group of three chapters that aim to show how all the facilities of the XSLT language can work together... more
This is the first of a group of three chapters that aim to show how all the facilities of the XSLT language can work together to solve real XML processing problems of significant complexity. Most of the code is presented in these chapters, but the complete stylesheets, and specimen data files, can be downloaded from the Wrox Web site at http://www.wrox.com/.
http://www.wrox.com/
As I described in the previous chapter, XSLT has a broad range of applications, and in these three chapters I have tried to cover a representative selection of problems. The three examples I have chosen are as follows:
The first example is a stylesheet for rendering sequential documents: specifically, the stylesheet used for rendering W3C specifications such as the XML and XSLT Recommendations. This is a classic example of the rule-based design pattern described on page 620 in Chapter 17.
The second example, in Chapter 19, is concerned with presenting structured data. I chose a complex data structure with many cross-references to illustrate how a navigational stylesheet can find its way around the source tree: the chosen example is a data file containing the family tree of the Kennedys. This example is particularly suitable for demonstrating how stylesheets and schemas can work together.
The final example stylesheet, in Chapter 20, is quite unrealistic but fun. It shows how XSLT can be used to calculate a knight’s tour of the chessboard, in which the knight visits every square without ever landing on the same square twice. This is not the sort of problem XSLT was designed to solve, but by showing that it can be done I hope it will convince you that XSLT has the computational power and flexibility to deal with the many more modest algorithmic challenges that arise in routine day-to-day formatting applications. New features in XSLT 2.0 make this kind of application much easier to write, which means that the stylesheet is almost a total rewrite of the XSLT 1.0 version.
The stylesheet presented in this chapter was written for a practical purpose, not to serve as an example of good programming practice. I wrote in an earlier edition of this book that the stylesheet was originally written by Eduardo Gutentag and subsequently modified by James Clark. The stylesheet at that time was around 750 lines long. The current version has grown to over 3000 lines in three different stylesheet modules, and claims as its authors Norman Walsh, Chris Maden, Ben Trafford, Eve Maler, and Henry S. Thompson. No doubt others have contributed too, and I am grateful to W3C and to these individuals for placing the stylesheet in the public domain. Because the stylesheet has grown so much, and because many of the template rules are quite repetitive, I have omitted much of the detail from this chapter, selecting only those rules where there is something useful to say. But I haven’t tried to polish the code for publicationI am presenting the stylesheet as it actually is, warts and all, because this provides many opportunities to discuss the realities of XSLT programming. It gives the opportunity to analyze the code as written and to consider possible ways in which it can be improved. To the individuals whose code I am criticizing, I apologize if this causes them any embarrassment. I do it because I know that all good software engineers value criticism, and these people are all top-class software engineers.
Before embarking on this chapter, I did wonder whether there was any value in presenting in a book about XSLT 2.0 a stylesheet that is written almost entirely using XSLT 1.0. As the chapter progressed, I found that it actually provided a good opportunity to identify those places where XSLT 2.0 can simplify the code that needs to be written. I hope that it will therefore serve not only as a case study in the use of XSLT 1.0 but also as an introduction to the opportunities offered by the new features in 2.0.
This chapter presents our second case study. Whereas the XML in the previous example fell firmly into the category of narrative... more
This chapter presents our second case study. Whereas the XML in the previous example fell firmly into the category of narrative (or document-oriented) XML, this chapter deals largely with data. However, as with many data-oriented XML applications, it is not rigid tabular data; rather, it is data with a very flexible structure with many complex linkages and with many fields allowed to repeat an arbitrary number of times. The data can also include structured text (document-oriented XML) in some of its elements.
The chosen application is to display a family tree, and the sample data we will use represents a small selection of information about the family of John F. Kennedy, president of the United States.
Because genealogy is for most people a hobby rather than a business, you may feel this example is a little frivolous. I think it would be a mistake to dismiss it that way, for several reasons:
Genealogy is one of the most popular ways of using the Web for millions of people. Collaborative Internet-based genealogy in particular is rapidly growing, as witness the popularity of software such as PhpGedView (phpgedview.net) and can be seen as a classic example of the phenomenon sometimes called “Web 2.0”. Catering to the information needs of consumers is a very serious business indeed, and whether consumers are interested in playing games, watching sport, making travel plans, or researching their family trees, the Web is in the business of helping them to do so. Genealogy is also one of the few areas where Web sites have built financial success by asking consumers to pay for content.
phpgedview.net
Genealogical information presents some complex challenges in terms of managing richly structured data, and these same problems arise in many other disciplines such as geographic information systems, criminal investigation, epidemiology, and molecular biology. Data that fits neatly into rows and columns, to my mind, isn’t interesting enough to be worth studying, and what’s more, it’s likely that the only reason it fits neatly into rows and columns is that a lot of important information has been thrown away in order to achieve that fit. With XML, we can do better.
To write the application shown in this chapter, we have to tackle the problems of converting from non-XML legacy data formats to XML formats, and from one XML data model to another, which are absolutely typical of the data conversion problems encountered in every real-world application.
I could have used an example with invoices and requisitions and purchase orders. I believe that the techniques used in this worked example are equally applicable to many practical commercial problems, but that you will find a little excursion into the world of genealogy a pleasant relief from the day job.
This chapter contains the third (and last) of the XSLT case studies. It shows how XSLT can be used to calculate a knight’s... more
This chapter contains the third (and last) of the XSLT case studies. It shows how XSLT can be used to calculate a knight’s tour of the chessboard, in which the knight visits every square without ever landing on the same square twice.
note
New features in XSLT 2.0 make this kind of application much easier to write, which means that the stylesheet is almost a total rewrite of the XSLT 1.0 version.
Readers of previous editions of this book have reacted differently to this case study. Some have suggested that I should be less frivolous, and stick to examples that involve the processing of invoices and purchase orders, and the formatting of product catalogs. Others have welcomed the example as light relief from the comparatively boring programming tasks they are asked to do in their day job. A third group has told me that this example is absolutely typical of the challenges they face in building real Web sites. The Web, after all, does not exist only (or even primarily) to oil the wheels of big business. It is also there to provide entertainment.
Whatever your feelings about the choice of problem, I hope that by showing that it can be done I will convince you that XSLT has the computational power and flexibility to tackle any XML formatting and transformation challenge, and that as you study it, you will discover ideas that you can use a wide range of tasks that are more typical of your own programming assignments.
This appendix summarizes the entire XPath 2.0 grammar. The tables in this appendix also act as an index: they identify the... more
This appendix summarizes the entire XPath 2.0 grammar. The tables in this appendix also act as an index: they identify the page where each construct is defined.
The way that the XPath grammar is presented in the W3C specification is influenced by the need to support the much richer grammar of XQuery. In this book, I have tried to avoid these complications.
The grammar is presented here for the benefit of users, not for implementors writing a parser (the W3C spec adopted the same approach in its final drafts). So there is no attempt to write the syntax rules in such a way that expressions can be parsed without lookahead or backtracking.
An interesting feature of the XPath grammar is that there are no reserved words. Words that have a special meaning in the language, because they are used as keywords («if», «for»), as operators («and», «except»), or as function names («not», «count») can also be used as the name of an element in a path expression. This means that the interpretation of a name depends on its context. The language uses several techniques to distinguish different roles for the same name:
«if»
«not»
«count»
Operators such as «and» are distinguished from names used as element names or function names in a path expression by virtue of the token that precedes the name. In essence, if a word follows a token that marks the end of an expression, then the word must be an operator; otherwise, it must be some other kind of name.
As an exception to the first rule, if a name follows «/», it is taken as an element name, not as an operator. To write «/ union /*», if you want the keyword treated as an operator, you must write the first operand in parentheses: «(/) union /*». Alternatively, write «/. union /*».
«/
/*»
«(/)
«/. union /*»
Some operators such as «instance of» use a pair of keywords. This technique was adopted in XQuery for use at the start of a construct such as «declare function», but it’s not actually needed for infix operators.
«instance of»
«declare function»
Function names, together with the «if» keyword, are recognized by virtue of the following «(» token.
«(»
Axis names are recognized by the following «::» token.
«::»
The keywords «for» «some» and «every» are recognized by the following «$» token.
«$»
The XSLT and XPath specifications associate error codes with each error condition. There is an implicit assumption here that... more
The XSLT and XPath specifications associate error codes with each error condition. There is an implicit assumption here that although the W3C specification defines no API for invoking XPath expressions, there will be such APIs defined elsewhere, and they will need some way of notifying the application what kind of error has occurred. The error codes may also appear in error messages output by an XSLT processor, though there is no guarantee of this.
Technically, error codes are QNames whose namespace is http://www.w3.org/2005/xqt-errors. The 8-character code that you usually see, such as XPTY0004, is the local part of the QName. This mechanism allow additional error codes defined by a product vendor or application writer to be allocated in a different namespace. If you detect an error at application level, you can call the error() function (see Chapter 13) to force an error to be raised, specifying the error code to be allocated.
http://www.w3.org/2005/xqt-errors
error()
There is no normative error message text associated with each error code, either in the specification or in this appendix: hopefully, real products will give error messages that are much more helpful than those in the specification, including an indication of where the error occurred. For each error, this appendix gives first a short description, then an explanation of possible causes. For the errors defined in the XPath and Functions and Operators specifications the short description is usually taken straight from the spec; for XSLT errors, the description in the spec is often quite long and technical, so the description given here is a gloss.
Experience with the XSLT and XQuery test suites suggests that different products will often report the same error in different ways, and for many error conditions there’s more than one code listed that could describe it. However, understanding error messages when things go wrong can be one of the most baffling experiences while learning a new language, so I thought that listing the codes and trying to explain them would be a worthwhile use of the space.
The designers of XSLT 2.0 and XPath 2.0 took a great deal of care to ensure that existing code should continue to work unchanged... more
The designers of XSLT 2.0 and XPath 2.0 took a great deal of care to ensure that existing code should continue to work unchanged as far as possible, and in my experience, moving forward to 2.0 rarely causes any compatibility problems. However, because there are so many new features, and particularly because of the changes in the type system, a few incompatibilities were inevitable. This appendix summarizes the areas where you are most likely to encounter problems. It's not a completely comprehensive list; for that, you should go to the relevant appendices of the W3C specifications for XSLT 2.0, XPath 2.0, and Functions and Operators. However, many of the incompatibilities described in those appendices are such obscure edge cases that you are very unlikely to encounter them in practice.
You can think of the transition from XSLT 1.0 to XSLT 2.0 as happening in three stages, though you may choose to do all three at once:
The first stage takes the stylesheet unchanged, still specifying «version="1.0"», and runs it under an XSLT 2.0 processor instead of an XSLT 1.0 processor.
«version="1.0"»
The next stage is to change the stylesheet to specify «version="2.0"». This has the effect of switching off backward-compatibility mode.
«version="2.0"»
The final stage is to modify the stylesheet to take advantage of new facilities introduced in XSLT 2.0 and XPath 2.0; notably, the ability to validate the source documents against a schema.
There is potential for transition problems to occur at each of these three stages. The focus in this appendix, however, is on the first two stages, because once you start changing your stylesheet or your application, it’s very much under your own control whether existing code keeps working.
In this appendix we’ll treat the XSLT changes and the XPath changes together.
It’s important to remember that we can only talk here about changes in the W3C language specification. The W3C specifications leave many options open to implementors, so there may be incompatible changes to products that are not described here. Some of these may be triggered by the change in language specificationfor example, an API for passing parameters to a stylesheet may change to accommodate the larger number of data types allowed, or vendors may have dropped support for extension functions that duplicate XSLT 2.0 functionality. But that’s entirely a matter for product vendors to sort out.
This appendix contains summary information about Microsoft’s XSLT processors.
At the time of writing, Microsoft does not yet... more
At the time of writing, Microsoft does not yet have an XSLT 2.0 processor, so the information in this appendix all relates to its XSLT 1.0 products. In view of this, I am not including a comprehensive specification of Microsoft’s APIs, merely an outline of their structure. The reference information can be found in Microsoft’s own documentation, or in books that concentrate on XSLT 1.0 processing.
The best information available on Microsoft’s future plans comes in a blog posting released just after XSLT 2.0 was finalized, at http://blogs.msdn.com/xmlteam/archive/2007/01/29/xslt-2-0.aspx. All this really does is to confirm that Microsoft has a development team in place to work on an implementation, but this represents a significant turnaround given that two years earlier Microsoft was saying itthought XQuery would meet all the requirements, and more recently that the way forward was its proprietary Linq to XML language. It’s likely to be 2009 at the earliest before we see a full product release, though hopefully there will be previews earlier than this.
http://blogs.msdn.com/xmlteam/archive/2007/01/29/xslt-2-0.aspx
The announcement suggests Saxon on .NET as an interim solution. A recommendation from Microsoft is not something I would have dared to hope for when I started out on the project!
Microsoft offers two families of products, with completely different APIs. The XSLT processor in the MSXML family comes as standard with Internet Explorer, though it is also available as a freestanding component and is delivered as part of the Office suite. In the current Microsoft jargon, this runs “on the native stack”, that is, it is compiled into machine code and calls the Windows APIs, rather than relying on the .NET platform. More recently, the System.Xml.Xsl package has become available as part of the .NET framework. This appendix gives a brief outline of both these product families.
System.Xml.Xsl
In the early days it was frequently reported that MSXML was faster than the .NET processor, and that it conformed more closely to the W3C specifications. As far as I can tell, neither of these criticisms is now valid. Both processors offer excellent performance, and few serious conformance issues are reported for either product (the main one being MSMXL’s cavalier attitude toward whitespace).
JAXP is a Java API for controlling various aspects of XML processing, including parsing, validation, and XSLT transformation... more
JAXP is a Java API for controlling various aspects of XML processing, including parsing, validation, and XSLT transformation. This appendix concentrates on the transformation API. During its development this was known as TrAX (Transformation API for XML)you will still see this term used occasionally.
Saxon is an implementation of XSLT 2.0 produced by the author of this book, Michael Kay. Saxon also includes XQuery and XML... more
Saxon is an implementation of XSLT 2.0 produced by the author of this book, Michael Kay. Saxon also includes XQuery and XML Schema processors. The product runs on two platforms, Java and .NET, and it exists in two versions: an open source product Saxon-B, which implements the basic conformance level of the XSLT specification, and a commercial product Saxon-SA, which adds schema-aware processing. All versions can be obtained by following links from http://saxon.sf.net/.
http://saxon.sf.net/
There is also an older version of Saxon available, version 6.5, which implements XSLT 1.0. This appendix is concerned only with the XSLT 2.0 processor.
The Java version of Saxon requires JDK 1.4 or a later Java release, and there are no other dependencies. The .NET version is produced by cross-compiling the Java code into the Intermediate Language (IL) used by the .NET platform, using the IKVMC cross-compiler produced by Jeroen Frijters (http://www.ikvm.net). This runs on .NET version 1.1 or 2.0.
http://www.ikvm.net
Altova is the company that produces the popular XMLSpy toolkit. Among its many capabilities this includes an XSLT 2.0 processor... more
Altova is the company that produces the popular XMLSpy toolkit. Among its many capabilities this includes an XSLT 2.0 processor, which can be used either as part of XMLSpy or on its own from the command line or via one of a number of application programming interfaces. XMLSpy is commercial software that can be purchased from www.altova.com.
www.altova.com
Altova’s XSLT 2.0 processor is available as a free (but not open source) download from the same siteit is part of a package called AltovaXML that also includes an XML validating parser, an XML Schema processor, an XQuery engine, and an XSLT 1.0 processor. The XQuery and XSLT 2.0 processors are both schema-aware. Although the product is internally a COM component, APIs are offered for COM, Java, and .NET.
As well as the XSLT 2.0 processor itself, XML Spy also includes an interactive XSLT debugger and a profiler for performance analysis.
Both products run on Windows only. The version described in this chapter is the 2008 edition.
This glossary gathers together some of the more common technical terms used in this book. Most of these terms are defined... more
This glossary gathers together some of the more common technical terms used in this book. Most of these terms are defined in the XSLT or XPath specifications, but some of them are borrowed from XML or other standards in the XML family, and one or two have been invented for the purposes of this book. So for each definition, I also tell you where the term comes from.
The definitions in all cases are my own; in some cases, the original specifications have a much more formal definition, but in other cases they are surprisingly vague.
Where a definition contains references to other terms defined in the glossary, these terms are written in italic.
Purchase Before purchasing this product, please be sure you have met all software and system requirements, and that you understand any limits placed upon its use.
Return Policy Wrox Chapters on Demand are non-returnable and non-refundable.
Reader Software Wrox Chapters on Demand are offered as PDFs, and they must be viewed using the Adobe Reader. If you do not have the Reader installed, it can be downloaded for free at Adobe.com.
Test Download As Wrox Chapters on Demand purchases are non-returnable, it is advisable that you test your system and software configurations with a free sample download before you place an order.
Usage Rights for a Wrox Chapter on Demand File Any Wrox Chapter on Demand product you purchase from this site will come with certain restrictions that allow Wiley to protect the copyrights of its products. After you purchase and download this title, you:
If you have any questions about these restrictions, you may contact Customer Care at (877) 762-2974 (8 a.m. - 5 p.m. EST, Monday - Friday). If you have any issues related to Technical Support, please contact us at 800-762-2974 (United States only) or 317-572-3994 (International) 8 a.m. - 8 p.m. EST, Monday - Friday).
Related Books