1

I have encountered an issue where XPath Queries will fail in KissXML when the XML contains processing instructions, but will work fine if the exact same XML does not contain any processing instructions.

As an example I have the following XML without processing instructions (demo1.xml):

<?xml version="1.0"?>
<tst:TstForm xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tst="http://schemas.somewhere.com/tst/1" xml:lang="en-US">
    <tst:TstDetails>
        <tst:Title>Labelling error on boxes</tst:Title>
        <tst:Description>Box labels</tst:Description>
    </tst:TstDetails>
</tst:TstForm>

I am parsing the XML and executing an XPath query as follows:

// Parse the DEMO XML - no processing instructions,
// so this will work
NSLog(@"**** About to parse %@ ****\n\n", xmlFilename);
NSString *XMLPath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:xmlFilename];
NSData *XMLData   = [NSData dataWithContentsOfFile:XMLPath];

NSError *err=nil;
NSURL *furl = [NSURL fileURLWithPath:XMLPath];
if (!furl) {
    NSLog(@"Can't create an URL from file: %@.", XMLPath);
    return;
}
xmlViewDoc = [[DDXMLDocument alloc] initWithData:XMLData
                                              options:0
                                                error:&err];

// Prove that the XML was parsed correctly...
NSLog(@"Root node'%@' found\n\n", xmlViewDoc.rootElement.name);
NSLog(@"About to iterate through elements 2 levels down from the root node...");
NSArray *nodes = xmlViewDoc.rootElement.children;
for (DDXMLElement *node in nodes) {
    NSArray *childNodes = node.children;
    for (DDXMLElement *childNode in childNodes) {
        NSLog(@" XPath: %@", childNode.XPath);
    }
}
NSLog(@"Root node elements END\n\n");

// Show that the namespaces are parsed correctly
NSLog(@"Root node contains the following namespaces...");
nodes = xmlViewDoc.rootElement.namespaces;
for (DDXMLElement *node in nodes) {
    NSLog(@" Found NS: '%@' = '%@'", node.name, node.stringValue);
}
NSLog(@"Namespaces END\n\n");

// Now execute an XPath query using the namespace
NSString *xPathQuery = @"/tst:TstForm/tst:TstDetails";
NSLog(@"Based on the above namespace and the XML structure, we should be able to execite the following XPath query:\n %@", xPathQuery);
NSLog(@"The XPath query should return two elements (Title & Description)...");
nodes = [xmlViewDoc.rootElement nodesForXPath:xPathQuery error:nil];
for (DDXMLElement *node in nodes) {
    NSLog(@" XPath: %@", node.XPath);
}
NSLog(@"XPathQuery END\n\n");

This code provides the following log output as expected:

2012-09-03 13:04:25.662 NDPad[37359:c07] **** About to parse demo1.xml ****

2012-09-03 13:04:25.690 NDPad[37359:c07] Root node'tst:TstForm' found

2012-09-03 13:04:25.690 NDPad[37359:c07] About to iterate through elements 2 levels down from the root node...
2012-09-03 13:04:25.691 NDPad[37359:c07]  XPath: /TstForm[1]/TstDetails[1]/Title[1]
2012-09-03 13:04:25.691 NDPad[37359:c07]  XPath: /TstForm[1]/TstDetails[1]/Description[1]
2012-09-03 13:04:25.691 NDPad[37359:c07] Root node elements END

2012-09-03 13:04:25.691 NDPad[37359:c07] Root node contains the following namespaces...
2012-09-03 13:04:25.692 NDPad[37359:c07]  Found NS: 'xsi' = 'http://www.w3.org/2001/XMLSchema-instance'
2012-09-03 13:04:25.692 NDPad[37359:c07]  Found NS: 'tst' = 'http://schemas.somewhere.com/tst/1'
2012-09-03 13:04:25.692 NDPad[37359:c07] Namespaces END

2012-09-03 13:04:25.692 NDPad[37359:c07] Based on the above namespace and the XML structure, we should be able to execite the following XPath query:
 /tst:TstForm/tst:TstDetails
2012-09-03 13:04:25.692 NDPad[37359:c07] The XPath query should return two elements (Title & Description)...
2012-09-03 13:04:25.693 NDPad[37359:c07]  XPath: /TstForm[1]/TstDetails[1]
2012-09-03 13:04:25.693 NDPad[37359:c07] XPathQuery END

However, if I then use the following XML which contains a single processing instruction (demo1-5.xml):

<?xml version="1.0"?>
<?tst-example-instruction?>
<tst:TstForm xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tst="http://schemas.somewhere.com/tst/1" xml:lang="en-US">
    <tst:TstDetails>
        <tst:Title>Labelling error on boxes</tst:Title>
        <tst:Description>Box labels</tst:Description>
    </tst:TstDetails>
</tst:TstForm>

The same code will fail, and only provide the following output:

2012-09-03 13:04:25.693 NDPad[37359:c07] **** About to parse demo1-5.xml ****

2012-09-03 13:04:25.756 NDPad[37359:c07] Root node'tst:TstForm' found

2012-09-03 13:04:25.756 NDPad[37359:c07] About to iterate through elements 2 levels down from the root node...
2012-09-03 13:04:25.756 NDPad[37359:c07]  XPath: /TstForm[1]/TstDetails[1]/Title[1]
2012-09-03 13:04:25.756 NDPad[37359:c07]  XPath: /TstForm[1]/TstDetails[1]/Description[1]
2012-09-03 13:04:25.756 NDPad[37359:c07] Root node elements END

2012-09-03 13:04:25.757 NDPad[37359:c07] Root node contains the following namespaces...
2012-09-03 13:04:25.757 NDPad[37359:c07]  Found NS: 'xsi' = 'http://www.w3.org/2001/XMLSchema-instance'
2012-09-03 13:04:25.757 NDPad[37359:c07]  Found NS: 'tst' = 'http://schemas.somewhere.com/tst/1'
2012-09-03 13:04:25.757 NDPad[37359:c07] Namespaces END

2012-09-03 13:04:25.757 NDPad[37359:c07] Based on the above namespace and the XML structure, we should be able to execite the following XPath query:
 /tst:TstForm/tst:TstDetails
2012-09-03 13:04:25.758 NDPad[37359:c07] The XPath query should return two elements (Title & Description)...
2012-09-03 13:04:25.758 NDPad[37359:c07] XPathQuery END

I can't see what would be wrong with this XML that would cause the XPath queries to fail, especially when iterating through the DOM shows that it parsed correctly.

Thanks, Paul

Paul
  • 79
  • 1
  • 12

1 Answers1

1

I now have a hack where I perform the following:

  • Read the XML as a string.
  • Parse using a regular expression to find all processing instructions.
  • Add all the processing instructions into a string array.
  • Use the regular expression to remove the processing instructions

This then allows me to execute the XPath queries without issue. However if I then need to write the XML back I need to:

  • Create a new mutable string to hold the output XML document/
  • Loop through the processing instructions and append them to the new output XML document string.
  • Get the string from my real XML document and append to the new output XML string.

However this is far from optimal and I don't understand why the processing instructions break the XPath queries in KissXML. Here is an example of my code:

    // Load the XML file from the application bundle
    NSString *xmlFilename = @"demo1-5.xml";
    NSLog(@"**** About to parse %@ ****\n\n", xmlFilename);
    NSString *XMLPath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:xmlFilename];
    NSString *xmlStringData = [NSString stringWithContentsOfFile:XMLPath encoding:NSUTF8StringEncoding error:nil];

    // Now that we have the XML as a string, use regex to:

    // 1) Find all processing instructions
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<\\?.*?\\?>" options:NSRegularExpressionCaseInsensitive error:nil];
    processingInstructions = [[NSMutableArray alloc] init];
    [regex enumerateMatchesInString:xmlStringData options:0 range:NSMakeRange(0, [xmlStringData length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
        // 2) Put all the matches in an array
        NSString *toStore = [xmlStringData substringWithRange:[match range]];
        [processingInstructions addObject:toStore];
    }];
    // 3) Remove all processing instructions from the XML string
    NSString *newString = [regex stringByReplacingMatchesInString:xmlStringData options:0 range:NSMakeRange(0, xmlStringData.length) withTemplate:@""];

    // Having removed the processing instructions, we can now parse the XML
    NSError *err=nil;
    NSURL *furl = [NSURL fileURLWithPath:XMLPath];
    if (!furl) {
        NSLog(@"Can't create an URL from file: %@.", XMLPath);
        return;
    }
    xmlViewDoc = [[DDXMLDocument alloc] initWithXMLString:newString
                                             options:0
                                               error:&err];

    // Now execute an XPath query using the namespace
    NSString *xPathQuery = @"/tst:TstForm/tst:TstDetails";
    NSArray *nodes = [xmlViewDoc.rootElement nodesForXPath:xPathQuery error:nil];
    for (DDXMLElement *node in nodes) {
        NSLog(@"Title Value: %@", node.stringValue);
    }


    // Finally we'll generate a new XML string which is a concatenation of
    // the processing instructions and the NSXMLDocument.
    NSMutableString *xmlOutputString = [[NSMutableString alloc] init];
    for (NSString *piString in processingInstructions) {
        [xmlOutputString appendFormat:@"%@\n", piString];
    }
    [xmlOutputString appendString:[xmlViewDoc XMLStringWithOptions:DDXMLNodePrettyPrint]];

    NSLog(@"XML:\n%@", xmlOutputString);
Paul
  • 79
  • 1
  • 12