1

I am looking for a way to make speedy modifications to large multi-gigabyte files. Do the Win32 API support the ability to insert text into a file at a specific offset without having to rewrite the entire file back to disk from the very beginning or from the offset of the change?

Consider an example. Let's say we have the text "test" repeated over and over in a file that is 1 GB in size. If I want to go to the 500 MB offset and insert the text "new", is there a way to insert it without having to rewrite the entire file from the beginning, and/or without having to rewrite the last 500 MBs of it?

Can it be done using Win32 API? If not, are there any strategies to optimize a text insertion operation like this to maximize speed?

Theo
  • 5,963
  • 3
  • 38
  • 56

3 Answers3

9

There are methods to rewrite only the portion after the insertion point, but generally, no - to insert something at a particular point in a file, you must re-write everything after that point.

This boils down to the way files are stored on disk - typically in chunks, such that this operation is either not possible or not easy. For 99% of the cases, this doesn't matter, so the API doesn't expose a way of doing this.

If you have control over the file format, you can engineer ways such that you can write data to the end of the file, but have some tracking data to say "this stuff really belongs here".

Thanatos
  • 42,585
  • 14
  • 91
  • 146
  • 1
    MS Word does this out of necessity to deal with large documents. – David Heffernan Jan 01 '11 at 10:25
  • Interesting that Microsoft doesn't support this. To me it seems it could be implemented by modifying the master file table entries to linking hard drive sectors at the desired location. Kind of like defragment, but in reverse. – Theo Jan 01 '11 at 17:35
  • @Theo: it is potentially possible, but most filesystems would probably only allow that to occur at some multiple of 512 bytes: a sector or cluster size, whichever the FS is chunking files into. So, if expose such a functionality, what happens if the user wants to write 500 bytes? Do I re-write the whole file, or only allow operations that match the (potentially varying across different volumes) cluster size? What happens to file pointers into that file by other processes/threads? Do they stay put relative to the data, or remain at position X? Each could be equally disturbing to an app. – Thanatos Jan 01 '11 at 21:47
  • 1
    ... in the end, it probably boils down to its "too complicated" and is more beneficial to keep things simple. (At the cost of some efficiency.) – Thanatos Jan 01 '11 at 21:47
1

When you open the file in read-write mode, you can write data in the middle of the file, yet this will override existing data. There's no easy way to insert the data to the file.

However to make your life easier when you have a 64-bit system (on 32-bit system this won't work in your particular scenario) it makes sense to employ memory-mapped file. With file API you need to copy the tail in a tricky way. With MMF you do the following: 1. Create a file mapping and map the file to memory 2. Move the tail further by moving the memory block using memmove or similar function which cares for overlapping blocks. 3. put your bytes in the middle.

With this approach memory manager will do most work for you.

Eugene Mayevski 'Callback
  • 45,135
  • 8
  • 71
  • 121
1

You can't do this. What you can do efficiently is append to a file. You'd need to build some structure into your file format if you wanted to take advantage of this, as Thanatos has described.

As usual, Raymond Chen has something to say on the matter. He's talking about deleting from the beginning of the file but the issues are essentially the same as for this question.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490