0

I have what I think is an interesting problem executing queries in Jackrabbit when a node in the query path is a UUID that start with a number.

For example, this query work fine as the second node starts with a letter, 'f':

/*/JCP/feeadeaf-1dae-427f-bf4e-842b07965a93/label//*[@sequence]

This query however does not, if the first 'f' is replaced with '2':

/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[@sequence]

The exception:

Encountered "-" at line 1, column 26.
Was expecting one of:
<IntegerLiteral> ...
<DecimalLiteral> ...
<DoubleLiteral> ...
<StringLiteral> ...
 ... rest omitted for brevity ...
     for statement: for $v in /*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[@sequence] return $v

My code in general

def queryString = queryFor path
def queryManager = session.workspace.queryManager

def query = queryManager.createQuery queryString, Query.XPATH // fails here
query.execute().nodes

I'm aware my query, with the leading asterisk, may not be the best, but I'm just starting out with querying in general. Maybe using another language other than XPATH might work.

I tried the advice in this post, adding a save before creating the query, but no luck

Jackrabbit Running Queries against UUID

Thanks in advance for any input!

Community
  • 1
  • 1
  • You already tried the 2 suggestions from here? http://jackrabbit.510166.n4.nabble.com/xpath-queries-with-node-names-consisting-of-numbers-td518798.html – matthias_h Sep 06 '14 at 18:49
  • Ah, I didn't find that link. So it seems its not the hyphens that cause the problem, its having a node in the path start with a number. If I use ISO9075.encodePath(path), I get: `/_x002a_/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//_x002a_[@sequence]` Which doesn't work. If I encode just the part of the path with the UUID, that starts with the number, it does work. Will play around some more. Thanks! – John Prystash Sep 06 '14 at 19:24
  • 1
    Glad that helped you. You should consider to post the solution that worked for you as answer and accept it, that closes the question and also provides some help for others with the same issue. – matthias_h Sep 06 '14 at 19:33
  • Planning on it when I'm all set! Thanks again – John Prystash Sep 06 '14 at 21:14

2 Answers2

1

A solution that worked was to try and properly escape parts of the query path, namely the individual steps used to build up the path into the repository. The exception message was somewhat misleading, at least to me, as in made me think that the hyphens were part of the root cause. The root problem was that the leading number in the node name created an illegal XPATH query as suggested above.

A solution in this case is to encode the individual steps into the path and build the rest of the query. Resulting in the leading number only being escaped:

/*/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//*[@sequence]

Code that represents a list of steps or a path into the Jackrabbit repository:

import org.apache.commons.lang3.StringUtils;
import org.apache.jackrabbit.util.ISO9075;

class Path {
    List<String> steps; //...

    public String asQuery() {
        return steps.size() > 0 ? "/*" + asPathString(encodedSteps()) + "//*" : "//*";
    }

    private String asPathString(List<String> steps) {
        return '/' + StringUtils.join(steps, '/');
    }

    private List<String> encodedSteps() {
        List<String> encodedSteps = new ArrayList<>();
        for (String step : steps) {
            encodedSteps.add(ISO9075.encode(step));
        }
        return encodedSteps;
    }
}

Some more notes:

If we escape more of the query string as in:

/_x002a_/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//_x002a_[@sequence]

Or the original path encoded as a whole as in:

_x002f_a_x002f_fffe4dcf0-360c-11e4-ad80-14feb59d0ab5_x002f_2cbae0dc-35e2-11e4-b5d6-14feb59d0ab5_x002f_c

The queries do not produce the wanted results.

Thanks to @matthias_h and @LarsH

  • BTW the `ISO9075.encode(step)` seems suspect to me. The purpose of this method is to convert an XML name to a legal SQL identifier, right? But what you're using it for, IINM, is to convert a UUID to a legal XML name. It might happen to work (in most cases), but simply concatenating `uuid` on the front (as you mentioned elsewhere) would be more understandable code, would produce more readable element names, and would be guaranteed to work. – LarsH Sep 08 '14 at 15:50
  • I'm taking the encoding route (for the moment at least), as I'm only using it for building the query. The node names themselves are not changing in any way with this option and can stay as they are (I don't have to retrofit existing paths into the content repository to fit a new node name pattern). – John Prystash Sep 08 '14 at 22:16
  • As you wish. I'm not sure what you mean by the node names not changing: in this case, the initial `2` is encoded as `_x0032_`. Maybe you mean they don't need to change from element names that have already been created using ISO9075.encode(). – LarsH Sep 09 '14 at 00:55
  • Apologies if I'm not being clear. The node name in the repository can still start with a number, it is only during the query is the encoded representation of the node is used. So I can get to a node like this: `rootNode.getNode("JCP").getNode("2a")` and search for all children of it with a query like `/*/JCP/_x0032_a/*` (which uses the encode method). In other words, how we store nodes and how we work with Jackrabbit normally does not have to change. The change with respect to encoding is only scoped to querying. – John Prystash Sep 09 '14 at 11:13
0

An XML element name cannot start with a digit. See the XML spec's rules for STag, Name, and NameStartChar. Therefore, the "XPath expression"

/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[@sequence]

is illegal, because the name test 2eead... isn't a legal XML name.

As such, you can't just use any old UUID as an XML element name nor as a name test in XPath. However if you put a legal NameStartChar on the front (such as _), you can probably use any UUID.

I'm not clear on whether you think you already have XML data with an element named <2eead...> (and are trying to query that element's descendants); if so, whatever tool produced it is broken, as it emits illegal XML. On the other hand if the <2eead...> is something that you yourself are creating, then presumably you have the option of modifying the element name to be a legal XML name.

LarsH
  • 27,481
  • 8
  • 94
  • 152
  • Thanks @LarsH. We are building the path into the content repository using a UUID from an outside source, where the UUID is used by a system outside of us to represent a particular user. Somewhat like you suggested, my first attempt at a solution involved simply prefixing the UUID node name when building the path into the repository with "uuid_". – John Prystash Sep 07 '14 at 13:35