How do I use graphics in XML?
Graphics have traditionally just been links which happen
to have a picture file at the end rather than another
piece of text. They can therefore be implemented in any
way supported by the XLink and XPointer specifications
(see question C.18, ‘How will XML affect my document
links?’), including using similar syntax to existing
HTML images. They can also be referenced using XML's
built-in NOTATION and ENTITY mechanism in a similar way
to standard SGML, as external unparsed entities.
However, the SVG specification (see the tip below, by
Peter Murray-Rust) lets you use XML markup to draw
vector graphics objects directly in your XML file. This
provides enormous power for the inclusion of portable
graphics, especially interactive or animated sequences,
and it is now slowly becoming supported in browsers.
The XML linking specifications for external images give
you much better control over the traversal and
activation of links, so an author can specify, for
example, whether or not to have an image appear when the
page is loaded, or on a click from the user, or in a
separate window, without having to resort to scripting.
XML itself doesn't predicate or restrict graphic file
formats: GIF, JPG, TIFF, PNG, CGM, EPS, and SVG at a
minimum would seem to make sense; however, vector
formats (EPS, SVG) are normally essential for
non-photographic images (diagrams).
You cannot embed a raw binary graphics file (or any
other binary [non-text] data) directly into an XML file
because any bytes happening to resemble markup would get
misinterpreted: you must refer to it by linking (see
below). It is, however, possible to include a
text-encoded transformation of a binary file as a CDATA
Marked Section, using something like UUencode with the
markup characters ], & and > removed from the map so
that they could not occur as an erroneous CDATA
termination sequence and be misinterpreted. You could
even use simple hexadecimal encoding as used in
PostScript. For vector graphics, however, the solution
is to use SVG (see the tip below, by Peter Murray-Rust).
Sound files are binary objects in the same way that
external graphics are, so they can only be referenced
externally (using the same techniques as for graphics).
Music files written in MusiXML or an XML variant of SMDL
could however be embedded in the same way as for SVG.
The point about using entities to manage your graphics
is that you can keep the list of entity declarations
separate from the rest of the document, so you can
re-use the names if an image is needed more than once,
but only store the physical file specification in a
single place. This is available only when using a DTD,
not a Schema.
How do I include one XML file in another?
This works exactly the same as for SGML. First you
declare the entity you want to include, and then you
reference it by name:
<!DOCTYPE novel SYSTEM "/dtd/novel.dtd" [
<!ENTITY chap1 SYSTEM "mydocs/chapter1.xml">
<!ENTITY chap2 SYSTEM "mydocs/chapter2.xml">
<!ENTITY chap3 SYSTEM "mydocs/chapter3.xml">
<!ENTITY chap4 SYSTEM "mydocs/chapter4.xml">
<!ENTITY chap5 SYSTEM "mydocs/chapter5.xml">
The difference between this method and the one used for
including a DTD fragment (see question D.15, ‘How do I
include one DTD (or fragment) in another?’) is that this
uses an external general (file) entity which is
referenced in the same way as for a character entity
(with an ampersand).
The one thing to make sure of is that the included file
must not have an XML or DOCTYPE Declaration on it. If
you've been using one for editing the fragment, remove
it before using the file in this way. Yes, this is a
pain in the butt, but if you have lots of inclusions
like this, write a script to strip off the declaration
(and paste it back on again for editing).
What is parsing and how do I do it in XML
Parsing is the act of splitting up information into its
component parts (schools used to teach this in language
classes until the teaching profession collectively
caught the anti-grammar disease).
‘Mary feeds Spot’ parses as
1. Subject = Mary, proper noun, nominative case
2. Verb = feeds, transitive, third person singular,
3. Object = Spot, proper noun, accusative case
In computing, a parser is a program (or a piece of code
or API that you can reference inside your own programs)
which analyses files to identify the component parts.
All applications that read input have a parser of some
kind, otherwise they'd never be able to figure out what
the information means. Microsoft Word contains a parser
which runs when you open a .doc file and checks that it
can identify all the hidden codes. Give it a corrupted
file and you'll get an error message.
XML applications are just the same: they contain a
parser which reads XML and identifies the function of
each the pieces of the document, and it then makes that
information available in memory to the rest of the
While reading an XML file, a parser checks the syntax
(pointy brackets, matching quotes, etc) for well-formedness,
and reports any violations (reportable errors). The XML
Specification lists what these are.
Validation is another stage beyond parsing. As the
component parts of the program are identified, a
validating parser can compare them with the pattern laid
down by a DTD or a Schema, to check that they conform.
In the process, default values and datatypes (if
specified) can be added to the in-memory result of the
validation that the validating parser gives to the
<person corpid="abc123" birth="1960-02-31"
gender="female"> <name> <forename>Judy</forename>
<surname>O'Grady</surname> </name> </person>
The example above parses as: 1. Element person
identified with Attribute corpid containing abc123 and
Attribute birth containing 1960-02-31 and Attribute
gender containing female containing ...
2. Element name containing ...
3. Element forename containing text ‘Judy’ followed by
4. Element surname containing text ‘O'Grady’
(and lots of other stuff too).
As well as built-in parsers, there are also stand-alone
parser-validators, which read an XML file and tell you
if they find an error (like missing angle-brackets or
quotes, or misplaced markup). This is essential for
testing files in isolation before doing something else
with them, especially if they have been created by hand
without an XML editor, or by an API which may be too
deeply embedded elsewhere to allow easy testing.