Normalising and signing XML is a well-known pain in the neck. You don't have to look far to find expressions of horror and dismay or even the outright rejection of the idea that XML is signable other than as a text string (Peter Gutmann: ‘XML is an inherently unstable and therefore unsignable data format.’).
XML DigSig is hard because XML Canonicalization is hard; and that's hard because, I think, it's happening at the wrong level (lexical rather than structural).
I believe it's possible to define an alternative canonicalization which accepts a larger range of input documents as equivalent, and which has fewer bells and whistles (and gongs and bugles), but which is much simpler both to define and to implement. I describe that in a preprint at arXiv (arxiv:1505.04437, which I aim to submit to Software: Practice and Experience when the library has had a little more exposure), and it's illustratively implemented in both C and Java in a library called Xoxa.
Why is this so hard?!
A large part of the problem is that XML canonicalization is tricky (which is in part because the parsing of XML has more detailed edge-cases than are apparent at first sight, a mistake in any one of which can wreck a canonicalization, and thus a signature based on it).
The process of XML canonicalization starts with an XML document D, and defines a document C which, when it's parsed, has the same XML Information Set as the document you started with. That is, a system has to parse D, serialise it in the special way to produce C, in such a way that a future further parse will produce the correct results. That's conceptually three passes, in two directions, through the considerable intricacies of the XML spec.
What's the point, again?
However, the important bit of all this to-and-fro is the post-parse contents of the InfoSet – you're not parsing this document just for fun, but because you want to do something with it. It's therefore at least surprising that the thing being signed is not this, but a new XML document derived from it. Peter Gutmann's argument (paraphrased) is that signature algorithms work on bags of bytes; XML documents are emphatically not just bags of bytes; thus there is a mismatch between the tool and the materials which it is barely feasible to resolve. I think Gutmann wastes some energy here kicking straw men, but the core argument is completely persuasive (also, he can't be too much reproached for this, since at least some XML straw men have been painstakingly erected by XML true believers themselves – yes, I'm looking at you, WS-*
).
Xoxa
The Xoxa normalization works by deriving a byte-stream from the parsed XML rather than a serialization, and by using as a concrete parse-tree, not the InfoSet, but the information available in an API such as SAX, Expat, or their analogues in other languages.
What does this normalization look like?
The Xoxa normalisation turns the XML
<doc>
<p class='foo'>Hello</p>
<p> there
chum
</p>
</doc>
into the normalized form:
(doc
Aclass foo
(p
-Hello
)p
(p
- there chum
)p
)doc
The bytes comprising this normalized form can then be signed, and the signature reinserted into the original XML, or else made available as the parsed XML is passed downstream.
SAX, Expat, and friends
Key point: That normalization can be generated purely from the information available in the SAX ContentHandler interface, or the various XML_SetXXXHandler
functions of Expat, or the various xmlparser.XXXHandler
methods in Python's xml.parsers.expat
package, or the various other Expat wrappers, or the callbacks in Perl's XML::SAX
parser (which isn't a wrapper), or indeed just about any non-trivial XML parser.
Exploiting an API
Deriving the normalization from an API means that a good deal of boring normalization (processing whitespace, incorporating entities, worrying about encodings) is, in effect, done for free by the underlying XML parser; it need not be re-specified, re-implemented, or ever remembered about.
Also (and this is both an efficiency and a simplicity consideration) in each case this canonicalization, and the following signature, can be generated without completely reserializing the document, and can be done en passant as part of normal XML processing.
Xoxa
There are more details of the process and the library, including Java and C API docs, at the project's web page. Comments are most welcome.