http://blog.davber.com/2006/09/17/xpath-with-namespaces-in-java/
XPath with namespaces in Java
XPath is the expression language operating on an XML tree, used from XSLT. It can also be used stand-alone, such as from a Java application. There is also a standard API, called JAXP, for Java. So, everything is nice and dandy, and the post should stop here! Well... for real XML documents, using namespaces, it is not this easy. That is why you need this post!
If you do not care about lengthy posts, but just want the bare facts, go to the end of this post
Let us first look at a simple example where the XML documents do not have namespaces. Say we have this XML document with some orders:
XML:
- <? xml version="1.0"?>
- <Sales>
- <Customer name="CostCo, Inc.">
- <Order price="2000">
- <Description>A tiny web site</Description>
- </Order>
- <Order price="12000">
- <Description>Database configuration</Description>
- </Order>
- </Customer>
- <Customer name="BigBuyer, Inc.">
- <Order price="30000">
- <Description>A J2EE system for lawyers</Description>
- </Order>
- </Customer>
- </Sales>
A quick XPath to get all big orders from all customers is:
XPATH:
//Customer/Order[@price> 10000]
Sun provides us with an API, JAXP, for accessing and manipulating XML documents, including using XPath. So, let us see exactly what we need to do to use XPath.
If you use Java 5.0, then you already have JAXP at your fingertips. Version 1.3 of JAXP, to be exact. Having an API is no fun without a corresponding implementation of it, though. Fortunately, J2SE comes with a reference implementation, based on Xerces and Xalan. This in contrast to Java 1.4, which used Crimson. There are some crucial differences between these versions, but I will leave that discussion - it is just too painful
Using Java 5.0, a complete application, which you can just compile and run, follows:
JAVA:
- package com.davber.test;
- import javax.xml.parsers.*;
- import javax.xml.xpath.*;
- import org.w3c.dom.Document;
- public class App
- {
- {
- try {
- // First, the XML document
- String xmlStr =
- "<?xml version=\"1.0\" ?>\n" +
- "<Sales xmlns=\"http://www.davber.com/sales-format\">\n" +
- "<Customer name=\"CostCo, Inc.\">\n" +
- "<ord:Order xmlns:ord=\"http://www.davber.com/order-format\" price=\"12000\">\n" +
- "<ord:Description>A bunch of stuff" +
- "</ord:Description>\n" +
- "</ord:Order>\n" +
- "</Customer>\n" +
- "</Sales>\n";
- DocumentBuilderFactory xmlFact =
- DocumentBuilderFactory.newInstance();
- xmlFact.setNamespaceAware(true);
- DocumentBuilder builder = xmlFact.
- newDocumentBuilder();
- new java.io.ByteArrayInputStream(
- xmlStr.getBytes()));
- // Now the XPath expression
- String xpathStr =
- "//Customer/Order[@price> 6000]";
- XPathFactory xpathFact =
- XPathFactory.newInstance();
- XPath xpath = xpathFact.newXPath();
- String result = xpath.evaluate(xpathStr, doc);
- result + "\"");
- }
- catch (Exception ex) {
- ex.printStackTrace();
- }
- }
- }
Compile and run it now. You will see
i.e., it matched one order.
XPath result is "A bunch of stuff"
i.e., it matched one order.
So, let us get down to business and add some namespaces to the input document.
XML:
- <? xml version="1.0"?>
- <Sales xmlns="http://www.davber.com/sales-format">
- <Customer name="CostCo, Inc.">
- <ord:Order xmlns:ord="http://www.davber.com/order-format" price="2000">
- <ord escription>A tiny web site</ord escription>
- </ord:Order>
- <ord:Order price="12000">
- <ord escription>Database configuration</ord escription>
- </ord:Order>
- </Customer>
- <Customer name="BigBuyer, Inc.">
- <ord:Order price="30000">
- <ord escription>A J2EE system for lawyers</ord escription>
- </ord:Order>
- </Customer>
- </Sales>
So we have two namespaces, where one is the the default namespace, i.e.,added to all local names without an explicit prefix. This is important to understand, since that means they will not match names without namespace in XPath. So, even for the default namespace, we need to explicitly add a namespace to the XPath! Let us try without namespaces, i.e., use the exact same code as before but with the namespace-annotated variant of the input XML.
You will get this intriguing output:
You will get this intriguing output:
XPath result is ""
Yep, nothing! Your non-namespaced XPath did not match the now namespaced input. No real surprise here. What might come as a surprise is that we cannot even use local names in the XPath expression for names in the default namespace. Try with the XPath
XPATH:
/Sales/Customer/@name
which should give us the concattenation of all the customer names. Replace the XPath of the application and recompile and run. You will again see this output:
i.e., no match!
XPath result is ""
i.e., no match!
So, as soon as we have namespaces in the XML document, default or not, we need to use namespaces in the XPath. We will later see how can circumvent this requirement, but let us now take it as a gospel
How do we adorn the XPath with namespaces? Well, just as done in the XML document, there has to be a mapping between prefixes and URIs. The problem in the stand-alone XPath case - as opposed to when it is used within an XSLT document - is that there is no place to do it in the XPath "document" itself.
Can we not just use the same prefixes as the XML document?
No, the matching procedure only cares about the (mapped) URIs and could not care less about the specific prefix we chose. If you do not believe me, try with the prefixed XPath
XPATH:
//ord:Order/ord:Description
Believe me now?
Ok, so how do we create and provide that mapping in JAXP?
There is this interface javax.xml.namespace.NamespaceContext which declares three methods, whereof we only care about one of them, which maps namespace prefix to URI. The other two are not used by the standard XPath executing implementations (AFAIK.) The complete application with an implementation of that interface corresponding to our test XML input follows.
JAVA:
- package com.davber.test;
- import java.util.Iterator;
- import java.util.List;
- import javax.xml.namespace.NamespaceContext;
- import javax.xml.parsers.*;
- import javax.xml.xpath.*;
- import org.w3c.dom.Document;
- public class App
- {
- {
- try {
- // First, the XML document
- String xmlStr =
- "<?xml version=\"1.0\" ?>\n" +
- "<Sales xmlns=\"http://www.davber.com/sales-format\">\n" +
- "<Customer name=\"CostCo, Inc.\">\n" +
- "<ord:Order xmlns:ord=\"http://www.davber.com/order-format\" price=\"12000\">\n" +
- "<ord:Description>A bunch of stuff" +
- "</ord:Description>\n" +
- "</ord:Order>\n" +
- "</Customer>\n" +
- "</Sales>\n";
- DocumentBuilderFactory xmlFact =
- DocumentBuilderFactory.newInstance();
- xmlFact.setNamespaceAware(true);
- DocumentBuilder builder = xmlFact.
- newDocumentBuilder();
- new java.io.ByteArrayInputStream(
- xmlStr.getBytes()));
- // We map the prefixes to URIs
- NamespaceContext ctx = new NamespaceContext() {
- public String getNamespaceURI(String prefix) {
- String uri;
- if (prefix.equals("ns1"))
- uri = "http://www.davber.com/order-format";
- else if (prefix.equals("ns2"))
- uri = "http://www.davber.com/sales-format";
- else
- uri = null;
- return uri;
- }
- // Dummy implementation - not used!
- public Iterator getPrefixes(String val) {
- return null;
- }
- // Dummy implemenation - not used!
- return null;
- }
- };
- // Now the XPath expression
- String xpathStr =
- "//ns1:Order/ns1:Description";
- XPathFactory xpathFact =
- XPathFactory.newInstance();
- XPath xpath = xpathFact.newXPath();
- xpath.setNamespaceContext(ctx);
- String result = xpath.evaluate(xpathStr, doc);
- result + "\"");
- }
- catch (Exception ex) {
- ex.printStackTrace();
- }
- }
- }
Now the XPath matches. Note that I did not use the same prefix as the XML document, since the choice of prefix is arbitrary. In order to match against the default namespace in the XML document, we need to use the prefix ns2then, as in the following XPath expression:
XPATH:
/ns2:Sales/ns2:Customer/@name
Try it and you will get:
XPath result is "CostCo, Inc."
If you think it is a PITA to explicitly map those prefixes to namespaces you have two choices:
- extract the mapping from the XML document itself
- disable the namespace awareness of the XML document
Let us look at the first alternative. Unfortunately (?), there is no such facility in the JAXP API, so we have to diverge into implementation specificity. Xalancontains an interface PrefixResolver (at least in version 2.7.0) which is very similar to the aforementioned JAXP interface. Unfortunately, "similar" is not the same as "compatible with", so we need to wrap the PrefixResolver asNamespaceContext to use it with the JAXP API.
Why do we want to use this non-compatible prefix resolver interface in the first case? Because of a convenient implementation calledPrefixResolverDefault. Instances suck their namespace mapping out of a living DOM node, as in
JAVA:
- PrefixResolver resolver = new PrefixResolverDefault(doc.getDocumentElement());
Making that prefix resolver compatible with JAXP is not that hard:
JAVA:
The other two methods have their old dummy implementation. Of course you need to make the prefix resolver final before using it from the instance of that anonymous class, but you already know that...
The complete code with this Xalan tool follows.
JAVA:
- package com.davber.test;
- import java.util.Iterator;
- import javax.xml.namespace.NamespaceContext;
- import javax.xml.parsers.*;
- import javax.xml.xpath.*;
- import org.w3c.dom.Document;
- import com.sun.org.apache.xml.internal.utils.PrefixResolver;
- import com.sun.org.apache.xml.internal.utils.PrefixResolverDefault;
- public class App
- {
- {
- try {
- // First, the XML document
- String xmlStr =
- "<?xml version=\"1.0\" ?>\n" +
- "<Sales xmlns=\"http://www.davber.com/sales-format\">\n" +
- "<Customer name=\"CostCo, Inc.\">\n" +
- "<ord:Order xmlns:ord=\"http://www.davber.com/order-format\" " +
- "price=\"12000\">\n" +
- "<ord:Description>A bunch of stuff" +
- "</ord:Description>\n" +
- "</ord:Order>\n" +
- "</Customer>\n" +
- "</Sales>\n";
- DocumentBuilderFactory xmlFact =
- DocumentBuilderFactory.newInstance();
- xmlFact.setNamespaceAware(true);
- DocumentBuilder builder = xmlFact.
- newDocumentBuilder();
- Document doc = builder.parse(
- xmlStr.getBytes()));
- // We map the prefixes to URIs, using Xalan's
- // document-extracting mapping tool and wrapping
- // it in a nice JAXP shell
- final PrefixResolver resolver =
- new PrefixResolverDefault(doc.getDocumentElement());
- NamespaceContext ctx = new NamespaceContext() {
- return resolver.getNamespaceForPrefix(prefix);
- }
- // Dummy implementation - not used!
- public Iterator getPrefixes(String val) {
- return null;
- }
- // Dummy implemenation - not used!
- return null;
- }
- };
- // Now the XPath expression
- String xpathStr =
- "//ord:Order/ord:Description";
- XPathFactory xpathFact =
- XPathFactory.newInstance();
- XPath xpath = xpathFact.newXPath();
- xpath.setNamespaceContext(ctx);
- String result = xpath.evaluate(xpathStr, doc);
- result + "\"");
- }
- catch (Exception ex) {
- ex.printStackTrace();
- }
- }
- }
Try it out! It produces the output you expected, right? Well, only if you expected Java to throw up a stack trace
It will say something like
javax.xml.transform.TransformerException: Prefix must resolve to a namespace: ord
But we did map ord to a namespace URI in the XML document, so why does that Xalan tool not see it? Is there a bug? Well, you might argue a design flaw, since it only uses the mapping found at the node passed to it, i.e., to top level namespace declarations in our case. So, let us move the namespace declaration to the top level, to get the following XML input
XML:
- <?xml version="1.0" ?>
- <Sales xmlns="http://www.davber.com/sales-format"
- xmlns:ord="http://www.davber.com/order-format">
- <Customer name="CostCo, Inc.">
- <ord:Order price="12000">
- <ord escription>A bunch of stuff</ord escription>
- </ord:Order>
- </Customer>
- </Sales>
Now you will see the output
XPath result is "A bunch of stuff"
So what about the default namespace? Do we need a prefix there, and in that case which one?
We try with
XPATH:
/Sales/Customer/@name
Hmm, no match... So, it is not that simple. Again, we have to use a namespace in the XPath since the elements we try to match against are in a namespace (though it happens to be the default namespace) and the only way to use a namespace is via a prefix. What is that illusive prefix we need to map, via the Xalan prefix resolver, to the default namespace? The answer is: the empty string, i.e., the following XPath will match:
XPATH:
/:Sales/:Customer/@name
Yes, the output is
XPath result is "CostCo, Inc."
So, the "Xalan" path allows us to quite easily deal with both explicitly prefixed elements and defaulted elements in the XML input. Note: this route is pretty fragile, since we here rely on a quite arbitrary choice of prefixes for those namespace URIs. I.e., if the XML author changes the prefix we are doomed. So, my recommendation is to use the URI via explicit mapping if you want/need to deal with namespace at all
How can we not care about the namespaces in the XML input? The only way is to disable namespace awareness, via the method invocation
JAVA:
- xmlFact.setNamespaceAware(false);
Whether the factory is namespace aware or not by default is implementation-specific. Well, it should not be, since the SAX 2 Specification states that itshould be namespace aware by default. It just happens that the Xerces parser of the J2SE reference implementation is not aware by default, so we could just leave the method invocation out of the picture. Let us set it explicitly to be on the safe side, and to support other implementations of JAXP... Thus, we get the following code:
JAVA:
- package com.davber.test;
- import java.util.Iterator;
- import javax.xml.namespace.NamespaceContext;
- import javax.xml.parsers.*;
- import javax.xml.xpath.*;
- import org.w3c.dom.Document;
- import com.sun.org.apache.xml.internal.utils.PrefixResolver;
- import com.sun.org.apache.xml.internal.utils.PrefixResolverDefault;
- public class App
- {
- {
- try {
- // First, the XML document
- String xmlStr =
- "<?xml version=\"1.0\" ?>\n" +
- "<Sales xmlns=\"http://www.davber.com/sales-format\" " +
- "xmlns:ord=\"http://www.davber.com/order-format\">\n" +
- "<Customer name=\"CostCo, Inc.\">\n" +
- "<ord:Order " +
- "price=\"12000\">\n" +
- "<ord:Description>A bunch of stuff" +
- "</ord:Description>\n" +
- "</ord:Order>\n" +
- "</Customer>\n" +
- "</Sales>\n";
- DocumentBuilderFactory xmlFact =
- DocumentBuilderFactory.newInstance();
- xmlFact.setNamespaceAware(false);
- DocumentBuilder builder = xmlFact.
- newDocumentBuilder();
- new java.io.ByteArrayInputStream(
- xmlStr.getBytes()));
- // We map the prefixes to URIs, using Xalan's
- // document-extracting mapping tool and wrapping
- // it in a nice JAXP shell
- final PrefixResolver resolver =
- new PrefixResolverDefault(doc.getDocumentElement());
- NamespaceContext ctx = new NamespaceContext() {
- public String getNamespaceURI(String prefix) {
- return resolver.getNamespaceForPrefix(prefix);
- }
- // Dummy implementation - not used!
- return null;
- }
- // Dummy implemenation - not used!
- public String getPrefix(String uri) {
- return null;
- }
- };
- // Now the XPath expression
- String xpathStr =
- "/:Sales/:Customer/@name";
- XPathFactory xpathFact =
- XPathFactory.newInstance();
- XPath xpath = xpathFact.newXPath();
- xpath.setNamespaceContext(ctx);
- System.out.println("XPath result is \"" +
- result + "\"");
- }
- ex.printStackTrace();
- }
- }
- }
It is interesting to note that with namespace awareness disabled,namespaced XPath names do not match, i.e., they live in an alternative universe. The XPath evaluation is still namespace sensitive, but the XML document now resides in the non-namespaced corner of that world. The only way to get back to the namespace-agnostic universe is to leave the prefixes out altogether, such as in
XPATH:
/Sales/Customer/@name
Alas, we are back to square one!
There are two important issues we have to consider:
- try to apply a namespaced XPath to a document not declaringthose namespaces will throw an exception
- other implementations of JAXP (earlier versions or not) might act a bit differently
The second issue is ignored for now, but please make sure you have the classpath setup correctly, so that the right implementation is at play; this classpath issue came to bite me a few days ago, effectively wasting a full day of work! The first issue is best explained by removing the prefixord along with URI in the XML input, while keeping it in the XPath. I.e., we will have the following XML input and XPath expressions.
XML:
- <?xml version="1.0" ?>
- <Sales xmlns="http://www.davber.com/sales-format">
- <Customer name="CostCo, Inc.">
- <Order price="12000">
- <Description>A bunch of stuff</Description>
- </Order>
- </Customer>
- </Sales>
XPATH:
//ord:Order/ord:Description
We here assume the same namespace mapping as before, either via an extraction - using the Xalan tool above - against the original document or with the explicit NamespaceContext we used initially. Using the latter method, we get the following code:
JAVA:
- package com.davber.test;
- import java.util.Iterator;
- import javax.xml.namespace.NamespaceContext;
- import javax.xml.parsers.*;
- import javax.xml.xpath.*;
- import org.w3c.dom.Document;
- public class App
- {
- public static void main( String[] args )
- {
- try {
- // First, the XML document
- String xmlStr =
- "<?xml version=\"1.0\" ?>\n" +
- "<Sales xmlns=\"http://www.davber.com/sales-format\">\n " +
- "<Customer name=\"CostCo, Inc.\">\n" +
- "<ord:Order " +
- "price=\"12000\">\n" +
- "<ord:Description>A bunch of stuff" +
- "</ord:Description>\n" +
- "</ord:Order>\n" +
- "</Customer>\n" +
- "</Sales>\n";
- DocumentBuilderFactory xmlFact =
- DocumentBuilderFactory.newInstance();
- xmlFact.setNamespaceAware(true);
- DocumentBuilder builder = xmlFact.
- newDocumentBuilder();
- new java.io.ByteArrayInputStream(
- xmlStr.getBytes()));
- NamespaceContext ctx = new NamespaceContext() {
- public String getNamespaceURI(String prefix) {
- String uri;
- if (prefix.equals("ord"))
- uri = "http://www.davber.com/order-format";
- else if (prefix.equals("sal"))
- uri = "http://www.davber.com/sales-format";
- else
- uri = null;
- return uri;
- }
- // Dummy implementation - not used!
- public Iterator getPrefixes(String val) {
- return null;
- }
- // Dummy implemenation - not used!
- return null;
- }
- };
- // Now the XPath expression
- String xpathStr =
- "//ord:Order/ord:Description";
- XPathFactory xpathFact =
- XPathFactory.newInstance();
- XPath xpath = xpathFact.newXPath();
- xpath.setNamespaceContext(ctx);
- String result = xpath.evaluate(xpathStr, doc);
- result + "\"");
- }
- catch (Exception ex) {
- ex.printStackTrace();
- }
- }
- }
Try it out. It will generate a stack trace:
[Fatal Error] :4:26: The prefix "ord" for element "ord:Order" is not bound.
org.xml.sax.SAXParseException: The prefix "ord" for element "ord:Order" is not bound.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at com.davber.test.App.main(App.java:34)
What was that other issue again? Ah, something about other implementations behaving differently than the reference implementation of J2SE! Let us try this conjecture by putting the J2EE 1.4 reference implementation in the classpath. We will use WSDP 1.6.
It is time to sum up our experiences with namespaces and XPath in Java.
Lessons Learned
For those of you who skipped the lengthy post and jumped right here:Welcome back!.
The lessons learned are - whereof most are applicable to XPath with namespaced XML input in any language:
- whenever an XML element belongs in a namespace, the XPath pattern must use the same namespace for names to match
- just as with XML documents, namespaces are given via prefixesin XPath
- the XPath evaluator does not care about prefixes at all, but only with the namespace URI
- the above lesson implies we need to map prefixes to URIs, which is done via the JAXP interface NamespaceContext
- even the default namespace is a namespace, and thus matching names have to be prefixed in XPath
- there is a Xalan-specific tool to extract prefix to URI mappings from a DOM node: PrefixResolverDefault
- using that Xalan resolver, you have to embed it in a proper JAXPNamespaceContext
- that Xalan resolver will map the empty prefix to the default namespace URI, so you have to use the syntax :elem in XPath order to match elem in the default namespace.
- using a namespace URI in the XPath that is not declared in the parsed XML document will throw an exception
- you can disable namespace awareness with the methodsetNamespaceAware of the document builder factory in JAXP.
- some implementations, such as Xerces, have namespace awareness disabled by default
julienw said,
I’ve a question :
If the parser is “Namespace Unaware” by default , why the XPath expression failed on the namespaced document at first ?
I got stuck by the same problem today; I nearly found the solution, with NamespaceContext, but I didn’t try using empty prefixes for my expression…
Thanks anyway, I hope it’ll work tomorrow
xPaths problem - Java Forums said,
[...] davber does IT � XPath with namespaces in Java __________________ dont worry newbie, we got you covered. [...]
Leave a Comment
You must be logged in to post a comment.
Nessun commento:
Posta un commento