select an xml element, ignore element name, print newline

Question

I'd like to select the first element, but ignore its name in the output.

This is what I'm getting, after requesting the first url element from each input xml file:

% xmllint \
 --xpath '(//yandexsearch/response/results/grouping/group/doc/url)[1]' \
 *.response.ya.xml
<url>https://example.com/</url><url>https://example.net/</url><url>https://example.org/</url>

But this is what I want instead:

https://example.com/
https://example.net/
https://example.org/

Note that the idea is to select the value of the first <url> element from each input Yandex.XML (Я Feel Lucky).

How do I do that with xpath?

@PatriceM., it's documented here: http://api.yandex.com/xml/doc/dg/concepts/response.xml — cnst, Jan 10 '14 at 21:20

score 4 · Answer 1 · answered Jan 10 '14 at 21:29

I ended up using awk to remove <url> and </url>, and print the text from each element on a separate line, ignoring all the empty lines:

xmllint \
--xpath '(//yandexsearch/response/results/grouping/group/doc/url)[1]' \
| awk -F'</?url>' '{for(i=2;i<=NF;i++) if ($i != "") print $i}'

score 2 · Answer 2 · edited May 23 '17 at 12:19

2

Try instead:

//yandexsearch/response/results/grouping/group/doc[1])/url/text()

XPath normally only selects nodes, and you would do concatenation in the code surrounding the xpath extraction.

That being said, XPath 2.0 can, if that's available to you:

string-join(//yandexsearch/response/results/grouping/group/doc[1])/url/text(), ' \n')

Also, this answer provides a couple of XSLT-based solutions.

edited May 23 '17 at 12:19

Community

1
1

answered Jan 10 '14 at 20:47

Patrice M.

4,209
2
27
36

I seem to be getting all the URLs with this, for all search results, instead of only the first search result. Also, this doesn't seem to insert any newline between the fields. On the other hand, `(//yandexsearch/response/results/grouping/group/doc/url)[1]/text()` does appear to be partially doing what I want (thanks for the `/text()` syntax hint!), but it still doesn't insert the newlines, either. – cnst Jan 10 '14 at 21:23
See my edits. that being said, I'm not clear where you want to place the [1], depending on your desired outcome, you should play around. – Patrice M. Jan 11 '14 at 01:38
As for `[1]`, I think you may be confused by my sample output, which is due to several separate input xml files (note the `*` on the command line). Anyhow, I'm getting `xmlXPathCompOpEval: function string-join not found` when trying to use your `string-join` example in my `xmllint` on OS X. I guess this question is less about xpath and maybe more about some custom options that xmllint or other tools may have? – cnst Jan 11 '14 at 02:57
It's an XPath 2.0 only function, xmllint probably doesn't support it yet; tool support for XPath 2.0 is not as good as for XPath 1.0 at this time. Sorry... That being said, with XPath 1.0 typically you do that outside of xpath by post-processing the extracted nodes, which is exactly what you did with your awk piped command, so kudos. – Patrice M. Jan 11 '14 at 03:08
Also, see this answer for a possible workaround : http://stackoverflow.com/a/17373540/366749 : you can use an XPath 1.0, var-args 'concat()' function but you have to know how many nodes are returned first... – Patrice M. Jan 11 '14 at 03:11

select an xml element, ignore element name, print newline

2 Answers2

Linked