6

I have big file (few GBs) with text.

For example, it have next text:

Hello, World!

I need to insert word " funny" at 5 position, and offset the rest of text:

Hello, funny World!

How I can don't read all file for offsetting rest? Or how I can optimise this operation?

Thanks.

Alexander Ruliov
  • 3,785
  • 3
  • 17
  • 18

3 Answers3

8

You can't. Plain text files cannot be shrunk or expanded in the beginning or middle of the file, but only at the end.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • @Rulexec: The same. This is a limitation of the file-systems that are generally used. – Björn Pollex Jun 19 '11 at 20:37
  • What do you mean by that? If there was some space reserved for future insertions, then it is technically no inserting, but overwriting. Is that what you meant? If not, please explain. – Björn Pollex Jun 19 '11 at 20:43
  • Some binary formats do support insertion via appending and rewriting, such as DBF supporting appending new records at the end and deletion of an old record by changing a character in the record, along with modification of any relevant indexes. While technically this isn't "inserting in the middle of the file", the new record does *appear* to be an insertion as shown in the representation of the format. – Ignacio Vazquez-Abrams Jun 19 '11 at 20:48
1

Well you cant, please see this for more info How do I modify a text file in Python?

Community
  • 1
  • 1
Adithya Surampudi
  • 4,354
  • 1
  • 17
  • 17
0

If your file is a few gigabytes, then probably my solution will apply only to 64-bit operating systems:

from __future__ import with_statement

import mmap, os

def insert_string(fp, offset, some_bytes):
    # fp is assumedly open for read and write
    fp.seek(0, os.SEEK_END)
    # now append len(some_bytes) dummy bytes
    fp.write(some_bytes) # some_bytes happens to have the right len :)
    fp.flush()
    file_length= fp.tell()

    mm= mmap.mmap(fp.fileno(), file_length)
    # how many bytes do we have to shift?
    bytes_to_shift= file_length - offset - len(some_bytes)
    # now shift them
    mm.move(offset + len(some_bytes), offset, bytes_to_shift)
    # and replace the contents at offset
    mm[offset:offset+len(some_bytes)]= some_bytes
    mm.close()

if __name__ == "__main__":
    # create the sample file
    with open("test.txt", "w") as fp:
        fp.write("Hello, World!")
    # now operate on it
    with open("test.txt", "r+b") as fp:
        insert_string(fp, 6, " funny")

NB: this is a Python 2 program on Linux. YMMV.

tzot
  • 92,761
  • 29
  • 141
  • 204