65

I'm trying to query an xml file generated by adium. xmlwf says that it's well formed. By using xmllint's debug option i get the following:

$ xmllint --debug doc.xml
DOCUMENT
version=1.0
encoding=UTF-8
URL=doc.xml
standalone=true
  ELEMENT chat
    default namespace href=http://purl.org/net/ulf/ns/0.4-02
    ATTRIBUTE account
      TEXT
        content=foo@bar.com
    ATTRIBUTE service
      TEXT compact
        content=MSN
    TEXT compact
      content= 
    ELEMENT event
      ATTRIBUTE type

Everything seems to parse just fine. However, when I try to query even the simplest things, I don't get anything:

$ xmllint --xpath '/chat' doc.xml 
XPath set is empty

What's happening? Running that exact same query using xpath returns the correct results (however with no newline between results). Am I doing something wrong or is xmllint just not working properly?

Here's a shorter, anonymized version of the xml that shows the same behavior:

<?xml version="1.0" encoding="UTF-8" ?>
<chat xmlns="http://purl.org/net/ulf/ns/0.4-02" account="foo@bar.com" service="MSN">
<event type="windowOpened" sender="foo@bar.com" time="2011-11-22T00:34:43-03:00"></event>
<message sender="foo@bar.com" time="2011-11-22T00:34:43-03:00" alias="foo"><div><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div></message>
</chat>
Jesper Rønn-Jensen
  • 106,591
  • 44
  • 118
  • 155
ailnlv
  • 1,779
  • 1
  • 15
  • 29

3 Answers3

106

I don't use xmllint, but I think the reason your XPath isn't working is because your doc.xml file is using a default namespace (http://purl.org/net/ulf/ns/0.4-02).

From what I can see, you have 2 options.

A. Use xmllint in shell mode and declare the namespace with a prefix. You can then use that prefix in your XPath.

    xmllint --shell doc.xml
    / > setns x=http://purl.org/net/ulf/ns/0.4-02
    / > xpath /x:chat

B. Use local-name() to match element names.

    xmllint --xpath /*[local-name()='chat']

You may also want to use namespace-uri()='http://purl.org/net/ulf/ns/0.4-02' along with local-name() so you are sure to return exactly what you are intending to return.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • 6
    Note example A. and B. will fail if you're not accessing a root path, in which case you need a double-slash, eg xmllint --xpath "//*[local-name()='chat']". See http://stackoverflow.com/questions/27311314/how-to-get-the-tag-yweathercondition-from-yahoo-weather-rss-with-xmllint?noredirect=1#comment43085213_27311314 – the_yellow_logo Dec 05 '14 at 10:59
  • 1
    @Avt'W - This question/answer is specifically about namespaces in xmllint; not any other XPath topics. What `/` and `//` match are totally unrelated. – Daniel Haley Dec 05 '14 at 22:22
  • 6
    Hey, it's was a comment for the reader that would have a slightly different use case, not a critic of your answer which answers the problem accurately. People having problem with namespaces likely are newbies and thus I thought it was worth pointing that out. – the_yellow_logo Dec 07 '14 at 08:59
  • 13
    **C.** `cat foo.xml | sed '2 s/xmlns=".*"//g' | xmllint --xpath ...` – djeikyb May 01 '15 at 00:11
  • 4
    @Avt'W observation was very helpful hint for us newbies. @daniel-haley Thanks for shell hint. Here is what I think full line would look like. `xmllint --xpath "//*[local-name()='chat' and namespace-uri()='http://purl.org/net/ulf/ns/0.4-02']"` – Greg Elin Jun 08 '15 at 04:27
  • 4
    NB. This can get confusing and lengthy very quickly. [This article](http://blog.powered-up-games.com/wordpress/archives/70) has a good tutorial on the subject; `namespace-uri()` must be added to every portion of the path that needs it, for example. – Dawngerpony Sep 10 '15 at 10:35
  • 2
    I wonder why they made shell option `setrootns` to register all namespaces from root node declaration but not in CLI mode :( – Dima Fomin Aug 09 '17 at 14:56
  • 3
    Not that parsing XML with sed is the best idea in the world, but that regex might be too greedy. To remove namespace declarations without taking out more than you meant, use `sed 's/xmlns="[^"]*"//g'`. – Ken Oct 13 '17 at 16:21
  • How did you know of the setns option in the shell. The man page has some entries for the shell commands and that is not one of them. Any method for doing something similar without the shell besides the comment with namespace-uri()... everywhere? – netskink May 03 '22 at 16:47
  • @netskink - I don't remember how I knew about setns or where it's documented. A possible alternative to xmllint would be [xmlstarlet](http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html#idm47077139530992). You can bind the namespace to a prefix on the command line or use "_" to match any namespace. – Daniel Haley May 03 '22 at 16:55
  • Hmm. funny you mentioed xmlstarlet. I have been trying that as well. I tried something like this `xmlstarlet sel -N i="someuri" -t -m //xyz -v "@moduleName" -n foo.xml` where xyz would be something like some_ns:some_tag or some_ns:some_tag. Now that I write this, I bet it should be: //i:xyz. – netskink May 03 '22 at 17:02
  • Nope that did not work. `xmlstarlet sel -N i="some_uri" -t -m /i:foo/i:goo -v "@some_attribute" -n foo.xml` – netskink May 03 '22 at 17:04
  • ahh, this works for xmlstarlet: xmlstarlet sel -N i="some_uri" -t -m /i:foo/i:goo -v "name()" -n foo.xml Without the -v part its just matching and not printing the matching portion. – netskink May 03 '22 at 18:03
  • 1
    And the thing which worked for me. `xmlstarlet sel -N x="some uri" -t -m "/x:foo/x:goo[@some_attr='some value']" -v '@some_attr' -n foo.xml It appears the " and ' are critical. Without them or in reverse order ie. outer is " and inner is ' will not work. – netskink May 03 '22 at 19:39
14

I realize this question is very old now, but in case it helps someone...

Had the same problem and it was due to the XML having a namespace (and sometimes it was duplicated in various places in the XML). Found it easiest to just remove the namespace before using xmllint:

sed -e 's/xmlns="[^"]*"//g' file.xml | xmllint --xpath "..." -

In my case the XML was UTF-16 so I had to convert to UTF-8 first (for sed):

iconv -f utf16 -t utf8 file.xml | sed -e 's/encoding="UTF-16"?>/encoding="UTF-8"?>/' | sed -e 's/xmlns="[^"]*"//g' | xmllint --xpath "..." -
stefan123t
  • 183
  • 2
  • 5
codesniffer
  • 1,033
  • 9
  • 22
  • This will clobber data in XML files. The point of tools like `xmllint` is to parse the XML properly. – binki Aug 03 '21 at 14:01
  • one can assign the `http` namespace a local name like `x` directly in the file: `sed -e 's/xmlns=/xmlns:x=/'`. Then you can use your command with xpath expressions like `//item` – user8162 Jan 05 '22 at 11:35
0

If you're allowed to install powershell in your environment (it's also available for Linux), you can do it like this:

Select-Xml -XPath '/ns:chat' -Namespace $Namespace .\doc.xml | foreach { $_.Node }
   xmlns   : http://purl.org/net/ulf/ns/0.4-02
   account : foo@bar.com
   service : MSN
   event   : event
   message : message

Of course all the same rules for xpath apply here. To access the text content of a node:

Select-Xml -XPath '/ns:chat/ns:message' -Namespace $Namespace .\doc.xml |foreach {$_.Node.InnerXML }
<div xmlns="http://purl.org/net/ulf/ns/0.4-02"><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div>

Or the content of the sender attribute:

Select-Xml -XPath '/ns:chat/ns:message/@sender' -Namespace $Namespace .\doc.xml |foreach {$_.Node }

#text
-----
foo@bar.com