3

I need to modify a file. We've already written a reasonably complex component to build sets of indexes describing where interesting things are in this file, but now I need to edit this file using that set of indexes and that's proving difficult.

Specifically, my dream API is something like this

//if you'll let me use kotlin for a second, assume we have a simple tuple class
data class IdentifiedCharacterSubsequence { val indexOfFirstChar : int, val existingContent : String }

//given these two structures 
List<IdentifiedCharacterSubsequences> interestingSpotsInFile = scanFileAsPerExistingBusinessLogic(file, businessObjects);
Map<IdentifiedCharacterSubsequences, String> newContentByPreviousContentsLocation = generateNewValues(inbterestingSpotsInFile, moreBusinessObjects);

//I want something like this:
try(MutableFile mutableFile = new com.maybeGoogle.orApache.MutableFile(file)){

    for(IdentifiedCharacterSubsequences seqToReplace : interestingSpotsInFile){

        String newContent = newContentByPreviousContentsLocation.get(seqToReplace);

        mutableFile.replace(seqToReplace.indexOfFirstChar, seqtoReplace.existingContent.length, newContent);
        //very similar to StringBuilder interface
        //'enqueues' data changes in memory, doesnt actually modify file until flush call...
    }

    mutableFile.flush();
    // ...at which point a single write-pass is made.
    // assumption: changes will change many small regions of text (instead of large portions of text) 
    // -> buffering makes sense
}

Some notes:

  • I cant use RandomAccessFile because my changes are not in-place (the length of newContent may be longer or shorter than that of seq.existingContent)
  • The files are often many megabytes big, thus simply reading the whole thing into memory and modifying it as an array is not appropriate.

Does something like this exist or am I reduced to writing my own implementation using BufferedWriters and the like? It seems like such an obvious evolution from io.Streams for a language which typically emphasizes indexed based behaviour heavily, but I cant find an existing implementation.

Lastly: I have very little domain experience with files and encoding schemes, so I have taken no effort to address the 'two-index' character described in questions like these: Java charAt used with characters that have two code units. Any help on this front is much appreciated. Is this perhaps why I'm having trouble finding an implementation like this? Because indexes in UTF-8 encoded files are so pesky and bug-prone?

Community
  • 1
  • 1
Groostav
  • 3,170
  • 1
  • 23
  • 27

0 Answers0