4

I need to update several nodes in an XML file (just substitute one text element for the other) in such way that resulting XML file will keep all its formatting (where possible).

For example, the following is the source document:

<project>
    <!-- Some long comment -->
    <!-- On several lines -->
    <name>Name</name> <!-- And here too -->
    <version>1.2.3</version>

</project>

and this is the required result document (note that the version has changed):

<project>
    <!-- Some long comment -->
    <!-- On several lines -->
    <name>Name</name> <!-- And here too -->
    <version>3.2.1</version>

</project>

So the result keeps all the formatting from source, and only version tag content has changed.

Unfortunately, I couldn't find a way to do this with standard Clojure (or Java) libraries. Sure, they do support basic indentation of string representation of XML, but it is not sufficient for me.

Is there a way to do this with some XML manipulation library (preferably in Clojure, but I guess Java is fine too), or I have to fall back to plain text/regexp substitutions? (really, I don't want XML tags leaking from my eyes, this should the last resort...)

Community
  • 1
  • 1
Vladimir Matveev
  • 120,085
  • 34
  • 287
  • 296
  • 1
    You should show the code that you're using to parse and serialize the file. Both the parser and serializer are supposed to retain all whitespace. There are, of course things that you can do to explicitly change this, such as telling the parser to coalesce text content or telling the serializer to pretty-print. – kdgregory Dec 14 '12 at 21:53
  • when I use clojure.xml parse and emit all the formatting is lost – Arthur Ulfeldt Dec 15 '12 at 00:31
  • @kdgregory well, something like this https://gist.github.com/4291928 ruins all formatting and strips all comments, `file.new.xml` will contain single string with XML data. Changing `emit` to `indent` (which, btw, is highly inadvisable because of performance) results to indented XML, but all comments are still stripped. – Vladimir Matveev Dec 15 '12 at 07:34
  • If I had to guess, based on your description and the code (and my limited knowledge of Lisp-like languages), the Closure reader and/or writer are throwing away content that they shouldn't (but which, in most cases, nobody cares about). I'd recommend calling out to the standard JDK parser serializer, although I have no idea how Closure would represent the DOM. – kdgregory Dec 20 '12 at 02:38

1 Answers1

0

Perhaps a hybrid approach, parse the XML with clojure.xml to find the exact text you want to replace, and ensure that you are changing the correct spot. Then use string replacement to change it. I'm hesatent to advise using regular expressions to parse XML

Community
  • 1
  • 1
Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284
  • That's what I'm intending to do, almost exactly. In fact, I want to try to add new feature to `clojure.data.xml`. It uses StAX parser internally, and it is possible to get exact location (in terms of line and column) of some part of XML document with it. I want to try and add storing this info as metadata within the parsed tree. If it is interesting for someone: https://github.com/dpx-infinity/data.xml . Though I'm kinda distracted with another task, an indenting XMLStreamWriter... – Vladimir Matveev Dec 18 '12 at 20:26
  • Accepting this answer because that is exactly what I intended to do. If someone is interested, I forked `data.xml` project on github and added some features to it, including support for loading comments from XML and proper indenting of the emitted XML. http://github.com/dpx-infinity/data.xml – Vladimir Matveev Dec 24 '12 at 07:39