Optimizing XQueries with eXist

Explanations of each of the following tips can be found at the end of the document

DON'TS

Here are the things to avoid:

DOS

Here are some recommendations for optimization:

Code Quality

Here are some points about writing good quality code. An example of a correct XQuery is to follow  TODO


EXPLANATIONS

Don't use eval()

It's important to realise that eXist caches the XQuery after compiling it. The snag is, the arguments to the eval() function can't be cached. Beyond that, using eval() leads to a style of programming that's hard to read and to debug. And eval() can always be replaced by a standard expression.

Don't evaluate expressions several times over and avoid redundant expressions

eXist doesn't perform any analysis or optimisation of queries akin to what a Java compiler does. So: no refactoring of  repeatedly-evaluated expressions, no elimination of code that won't be executed, etc. Pay particular attention to repeatedly evaluated expressions, they should be evaluated once only and the result placed into a variable, which also makes for more readable code

Don't use //

$a//b causes a complete traversal of all nodes of which $a is the root in search of an element b. In most cases the location of b is fairly precisely known, and so would be better to specify it.

Don't query constructed document fragments

A typical example (to avoid) :

let $e := <a><b>content</b></a>     (: $e is a constructed document fragment :)
let result := $e/b/text()

Minimise the execution of queries based on a given search expression.

A query like

res := collection("/db/projects") /a/b [ id = $val ]

causes a complete scan of an entire collection. Admittedly, queries like this are at the heart of an XQuery (and account for most of its execution time). But once the result $res has been retrieved, it can be efficiently used as a starting point for navigation to its parent, siblings and children:

$a := $res / parent::a
$next-sibling := $a / next-sibling:a

Make appropriate use of indexes adapted to your search criteria.

There are currently three types of user-configurable indexes in eXist. All require pre-indexation either of the base collection or of specified node-sets in sub-collections.

  1. The fulltext index, which indexes lexical tokens ("words" in Western scripts). Indexation can be configured to include or exclude nodes speficied using a limited subset of XPath
  2. Typed indexes over nodes specified by a limited subset of XPath (called "range indexes" because they permit queries referring to a range of numerical values)
  3. Indexes by tag name ("Qname index")  http://wiki.exist-db.org/space/jmvanel/New+index+by+QName

Index 2. is slower than 3., but has two advantages

Index 3 lacks these advantages, but is almost as fast as a relational database. Such an index cannot be constrained by an XPath, but only by a tag name. Both index and and index 2 are typed (integers or strings), and allow matching by criteria of equality or inequality (comparison).

Document in XQuery the argument and return types

Don't write :

declare function local:add($n, $m) {
<result> $n + $m </result>
};

This is more explicit and auto-documenting. And for the same price you get run-time arguments checking . If you know for shure the types you manipulate, declare them !

declare function local:add($n as xs:integer, $m as xs:integer)
as element(result) {
<result> $n + $m </result>
};

Keep data retrieval separate from result construction
TODO example