0

I need to edit a binary file and have to take consideration of a few constraints.

Context

  • I have a very large binary file (for example, 1GB)
  • It contains some strings, numbers and more.
  • I know the structure of the file and know how to parse it.

I'd like to replace a known string (or byte array or whatever) in this file by another string (or whatever).

Problem

The main difficulties I'm facing are the following:

  1. The file is very large (potentially several gigabytes), so I don't want to load it entirely in memory. Therefore, deserializing it entirely isn't an option. I also don't want to create a new file next to the old one because there may not be enough space on the hard drive to host 2 very large files.
  2. The size of the replacing data may be different than the original one (think about a string with a different length for example). So I may need move some bytes in the file when replacing the data (is that even possible? That sounds tricky to me).
  3. The data to replace can be anywhere in the file.

Therefore, my question:

How can I replace a string (for example) in a large binary file without rewriting the entire file and loading the entire file in memory?

Something I'm allowed to do if it can help:

If I can't "replace", I can still "remove" the old data (which is anywhere in the file) and add the new one at the end of the file (or anywhere else I know it won't break the file format). But I also don't find how to do that without rewriting the entire file in a new one or loading the entire file in memory.

Thanks :)

Veler
  • 161
  • 2
  • 13
  • If this is possible at all, then I imagine it would be through a persisted memory-mapped file. I don't have much experience with it - certainly not recently - and I'm not at all sure it can be used to replace a chunk with a differently sized chunk. My guess is you'll hit a snag, but do look into it. – Bent Tranberg Dec 18 '20 at 20:04

3 Answers3

3

You can use FileStream for random access within a file. You can change the .Position to an arbitrary position within the file and read or write bytes.

However if "the size of the replacing data may be different than the original one", you'll have to re-write the file from that point forward.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Thanks :) Regarding `you'll have to re-write the file from that point forward.` , I saw I can `OpenWrite` a file. it makes me think I can potentially do the following: `1.` When reading the file, if I want to keep the data, I just ignore it and move the Position. `2.` If I detect the data I want to "replace", I can take the `next` chunk and place it at the position where my old data is. `3.` I then keep rewriting chunks by considering the difference of location with what has been removed. `4.` And then I add the new data at the end of the file. Would that work? – Veler Dec 18 '20 at 20:09
1
  1. Open your file with a read-only stream.
  2. Create a temporary file for writing.
  3. Parse your large file, writing data to the temporary file.
  4. If you successfully parsed the file, swap the source and temporary file references.
  5. Delete the temporary file.

Do not try to overwrite large files in-place; you will inevitably get a file error, parse error, powerfail, or something, and your source data will be corrupted.

Dour High Arch
  • 21,513
  • 29
  • 75
  • 90
  • Thanks for the suggestion. Fortunately I'm early in development and I'm thinking I can maybe use a ZIP format instead? – Veler Dec 18 '20 at 21:19
1

The other answers may work, but they won't meet your criteria of

without rewriting the entire file and loading the entire file in memory

What you're asking for simply can't be done, in the manner you want it to be.

Files are simply a stream of bytes, stored in fix-length chunks (blocks) on the disk.

While it is possible to replace a given array of bytes with another, they'd have to be the same size. Anything smaller or larger means the stream from that point forward would need to be re-written.

While it may seem that Word or Visual Studio or any other editor you care to name routinely inserts strings. That's not what's actually happening. Basically, the file's read into memory, manipulated in memory, and then saved back out to disk in it's entirety.

Lastly, 1GB really isn't that big now-a-days. You shouldn't have an space issue with the needed temporary copy. Either in memory, or on disk.

Charles
  • 21,637
  • 1
  • 20
  • 44
  • It's sadly what I figured out :-/ Thanks for clarifying :) I'm taking another path now. – Veler Dec 20 '20 at 19:37