0

I need to list the text inside these two elements in the many pom.xml in a directory tree. The files may contain the element at other places too, I am only looking for the content of these two.

Ideally, I am looking for a tool that outputs in the format <file-name>:<line-no>:<path>:<text>, e.g.

parent/pom.xml:12:/project/groupId:com.acme.project
features/persist/pom.xml:14:/project/parent/groupId:com.acme.project
features/persist/pom.xml:32:/project/groupId:com.acme.project.persist

For the following input files:

**parent/pom/xml**
<project>
 ...
  <groupId>
  com.acme.project <!--LINE 12 --> 
  </groupId>
...
</project>

**feature/persist/pom.xml**
<project>
  <parent>
    <groupId>
    com.acme.project <!--LINE 14 --> 
    </groupId>
  </parent>
  ...
  <groupId>
  com.acme.project
  </groupId>
  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>
        a.b.c.d <!-- this is not listed in output -->
        </groupId>
      </dependency>
    </dependencies>
  </dependencyManagement>
</project>

Note that other paths such as dependencyManagement/dependencies/dependency/groupId are not included.

Searching here on SO I came across xmllint --xpath and but I don't know enough about xpath to figure this out myself.

Community
  • 1
  • 1
Miserable Variable
  • 28,432
  • 15
  • 72
  • 133
  • Please post example input which should yield this output. – Jens Erat Apr 18 '13 at 19:31
  • Output was made up :) But I have made up the corresponding input and added it – Miserable Variable Apr 18 '13 at 19:45
  • Just realized you need the line number and path to the element. There is no way to get the line number by standard XPath/XQuery, not even in version 3.0. At least [Saxon has some propietary support for this](http://saxon.sourceforge.net/saxon6.5.3/extensions.html#linenumber). To the path: there is `fn:path()` for this in XPath/XQuery 3.0, so you will need a more capable and up to date processor. Neither [xmllint] nor [xmlstartlet] support more than XPath 1.0, so you will need another tool. – Jens Erat Apr 18 '13 at 21:34
  • I can do without line numbers. – Miserable Variable Apr 18 '13 at 22:07

2 Answers2

0

Try this XPath 2.0 compatible query without line numbers, which checks all query paths for matching <groupID/> elements and prints their document name, constructs a (not necessarily unique) path and adds the contents of the element.

(//project/parent | //project)/groupId/string-join(
  (
    base-uri(),
    string-join(('', .//ancestor-or-self::*/name()), '/'),
    data(.)
  ), ':')

You could run it using a BaseX collection for example (like I did for testing), which contains all the XML files you want to query.

  1. Run this command to create the collection: CREATE DB xmldocs /path/to/xml-files
  2. Query the database using above XPath

There are different ways to run the query, have a look at the Standalone Mode manual.

The query should also run in other XPath 2.0 compatible engines like saxon (which also would support line numbers, see my comment above).

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
0

I ended up using the cygwin build of xml2:

xml2 <  pom.xml | grep -e "/project/parent/groupId" -e "/project/groupId"
/project/parent/groupId=...
/project/groupId=....
Miserable Variable
  • 28,432
  • 15
  • 72
  • 133