xmllint and xpath to parse xml data from https://mail.google.com/mail/feed/atom

Question

I am getting some xml data from my gmail account that I would like to parse. Ths xml data looks like:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://purl.org/atom/ns#" version="0.3">
  <title>Gmail - Inbox for @gmail.com</title>
  <tagline>New messages in your Gmail Inbox</tagline>
  <fullcount>54</fullcount>
  <link rel="alternate" href="http://mail.google.com/mail" type="text/html"/>
  <modified>2014-11-25T04:40:04Z</modified>
  <entry>
    <title>test</title>
    <summary/>
    ...
</feed>

and I was hopping to get all the titles of all the entry with something like:

xmllint --xpath '//feed/entry/title' myfile.xml

Now, I found out that this would work if there was not this xmlns info. But with the xmlns info, I get the message

XPath set is empty

I would like a simple oneliner to parse this file, without having to modify the file (removing the xmlns section).

--> EDIT: Thanks to @Mathias, the proper onliner looks like: echo "setns x=http://purl.org/atom/ns#\nxpath /x:feed/x:entry/x:title/text()"

possible duplicate of [xmllint failing to properly query with xpath](http://stackoverflow.com/questions/8264134/xmllint-failing-to-properly-query-with-xpath) — Daniel Haley, Nov 25 '14 at 21:12

score 3 · Accepted Answer · edited May 23 '17 at 11:44

3

You are probably aware that your input XML is in a default namespace. Your original XPath expression:

xmllint --xpath '//feed/entry/title' myfile.xml

will never succeed to find elements that are in a namespace. That's why the XPath result set is empty.

If you're absolutely unwilling to register or declare a namespace, the following expression works:

xmllint --xpath "//*[name() = 'feed']/*[name() = 'entry']/*[name() = 'title']" myfile.xml

If your input XML contained prefixed namespaces, you'd have to use local-name() instead of name().

An alternative that is not a "simple oneliner" is to use xmllint in shell mode, register a namespace together with a prefix and use it in the XPath expression. See this answer for details. That's the proper way of addressing the problem.

edited May 23 '17 at 11:44

Community

1
1

answered Nov 25 '14 at 10:28

Mathias Müller

22,203
13
58
75

Great, the onliner works perfectly. Why would using the shell be "the proper way"? – Jonybegood Nov 25 '14 at 22:09
@Jonybegood It's not using the shell that is the proper way, taking into account namespaces in an XML document - instead of ignoring them - is the proper way. That xmllint is only capable of this in shell mode is just a coincidence. – Mathias Müller Nov 26 '14 at 08:34
Thanks, it makes perfect sense. I edited the original post to include the proper oneliner solution – Jonybegood Nov 30 '14 at 21:58

score 0 · Answer 2 · answered Nov 25 '14 at 06:18

0

Try debugging the same in the shell for xmllint :

xmllint --shell filename

xpath '//feed/entry/'

Debug like the above, traversing into the nodes level by level, so that you will come to know where it is breaking

answered Nov 25 '14 at 06:18

Akhil Thayyil

9,263
6
34
48

Not a very helpful answer. The problem is obvious, the elements are in a namespace. No need for debugging. – Mathias Müller Nov 25 '14 at 10:29
@MathiasMüller he can debug himself and figure out solution, in the above code sample he had shown, the xml is incomplete, we cant predict the error looking into that – Akhil Thayyil Nov 25 '14 at 11:14
If the error were due to the document not being well-formed XML, the result would not be `XPath set is empty`. So, we know that the XML is "complete". Also, in shell mode you must not put expressions in single quotes, otherwise the path expression is not evaluated. Finally, the debugging only tells you the exact same thing: that the result set contains zero nodes. – Mathias Müller Nov 25 '14 at 11:49

xmllint and xpath to parse xml data from https://mail.google.com/mail/feed/atom

2 Answers2