4

Suppose I have the following XML File:

<book>
 <name>sometext</name>
 <name>sometext</name>
 <name>sometext</name>
 <name>Dometext</name>
 <name>sometext</name>
</book> 

If I wanted to modify the content by changing D to s (As shown in the fourth "name" node) without having to read/write the entire file, would this be possible?

Abhi
  • 5,501
  • 17
  • 78
  • 133
  • 1
    Are all your edits length-preserving? – H H Oct 13 '11 at 08:23
  • @Henk cant get what do u mean by length-preserving here :( – Abhi Oct 13 '11 at 08:25
  • Your example replaces 1 char with 1 other char. That's doable. But changing `Dometext` into `SomeString` is a totally different problem. – H H Oct 13 '11 at 08:28
  • @Abhishek Gupta That the edits to not increase (or decrease) the length (in bytes) of the XML file. And in particular, that the edits do not affect the length outside a small isolated region -- replacing "Dometext" with "sometext" is preserves the length while replacing "Dometext" with "Hello world!" does not. –  Oct 13 '11 at 08:29
  • no the replacing text could be of any length , even i if we also may require to delete whole particular node – Abhi Oct 13 '11 at 08:31
  • @AbhishekGupta, then please enhance the example. You don't want to replace chars anyway, but strings. – H H Oct 13 '11 at 08:34

4 Answers4

3

A 10 MB file is not a problem. Slurp it up. Modify the DOM. Write it back to the filesystem. 10 GB is more of a problem. In that case:

Assumption: You are not changing the length of the file. Think of the file as an array of characters and not a (linked) list of characters: You cannot add characters in the middle, only change them.

You need to seek the position in the file to change and then write that character to disk.

In the .NET world, with a FileStream object, you what to set the Position attribute to the index of the D character and then write a single s character. Check out this question on random access of text files.

Also read this question: How to insert characters to a file using C#. It looks like you can't really use the FileStream object, but instead will have to resort to writing individual bytes.

Good luck. But really, if we are only talking 10 MB, then just slurp it up. The computer should be doing your work.

Community
  • 1
  • 1
Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
2

I would just read in the file, process, and spit it back out.

This can be done in a streaming fashion with XmlReader -- it's more manual work than XmlDocument or XDocument, but it does avoid creating an in-memory DOM (XmlDocument/XDocument can be used with this same read/write pattern, but generally require the full reconstruction in-memory):

  1. Open file input file stream (XmlReader)
  2. Open output file stream (XmlWriter, to a different file)
  3. Read from XmlReader and write to XmlWriter performing any transformations as neccessary.
  4. Close streams
  5. Move new file to old file (overwrite, an atomic action)

While this can be setup to process input and output on the same open file with a bunch of really clever work nothing will be saved and there any many edge cases including increasing on decreasing file lengths. In fact, it might be slower to try and simply shift the contents of a file backwards to fill in gaps or shift the file contents forward to make new room. The filesystem cache will likely make any "gains" minimal/moot for anything but the most basic length-preserving operation. In addition, modifying a file in place is not an atomic action and is generally non-recoverable in case of an error: at the expense of a temporary file, the read/write/move approach is atomic wrt the final file contents.

Or, consider XSLT -- it was designed for this ;-)

Happy coding.

1

The cleanest (and best) way would be to use the XmlDocument object to manipulate, but a quick and dirty solution is to just read the XML to a string and then:

xmlText = xmlText.Replace("Dometext", "sometext");
Paul Michaels
  • 16,185
  • 43
  • 146
  • 269
  • how to use the XmlDocument object to manipulate. so through this we are not required to rewrite whole file :) , because my XMl file is of more than 10 MB in size and i dont want to rewrite it just due to change of a single character :) – Abhi Oct 13 '11 at 08:27
  • 1
    "without having to read/write the entire file". When you read it all you might as well load it into an XDocument. – H H Oct 13 '11 at 08:30
  • @Abhishek Gupta 1) Just do it "the simple way" 2) Benchmark it 3) If #2 show it's too slow (which requires having a functional requirement defined ;-), do it "the hard way" -- programmer time counts too ;-) 10MB to a modern computer is (often) nothing. –  Oct 13 '11 at 08:31
  • To use XmlDocument you would really need to rewrite the whole file. If you are dealing with a file on disk, then this might be helpful: http://stackoverflow.com/questions/1368539/how-do-i-read-and-edit-a-txt-file-in-c – Paul Michaels Oct 13 '11 at 08:33
1

An XML file is a text file and does not allow for insertion/deletions. The only mutations supported are OverWrite and Append. Not a good match for XML.

So, first make very sure you really need this. It's a complicated operation, only worth it on very large files.

Since there could be a change in length you will at least have to move everything after the first replacement. The possibility of multiple replacements means you may need a big buffer to accommodate the changes.

It's easier to copy the whole file. That is expensive in I/O but you save on memory use.

H H
  • 263,252
  • 30
  • 330
  • 514