0

I've read here that editing a file in-place is not possible since it's OS related and entirely independent of language. Why is that? Doesn't the C standard fseek() define an in-place file write?

Community
  • 1
  • 1
atx
  • 4,831
  • 3
  • 26
  • 40
  • 1
    You can overwrite, you can append, in some implementation you can truncate the file, but you cannot delete/insert character arbitrarily. – nhahtdh Jun 25 '12 at 06:03
  • take a look at this : http://stackoverflow.com/questions/125703/how-do-i-modify-a-text-file-in-python. You might get your answer..:) – Vizard Jun 25 '12 at 06:08

5 Answers5

3

You can seek in Python as well - the answer you linked is just pointing that you can't insert in the middle of a file (ABC -> ABXC).

If you seek and then write, whatever data is written overwrites the contents at that position of the file (ABC -> AXC).

Amber
  • 507,862
  • 82
  • 626
  • 550
1

The point is that a file is like an array (just on disk). It occupies a fixed place on the disk.

If you have to insert something in the middle of an array, you have to move out the trailing elements, otherwise you will overwrite these. The same concept applies to files, you need to move out the trailing part of the file to insert something in the middle.

The one exception to this is when you need to overwrite a fixed size block of the file with a new block of the same size. Then it is perfectly fine to use fseek() and then just overwrite.

Simon Ejsing
  • 1,455
  • 11
  • 16
1

The problem is OS related, because it depends on how OS organizes data on the disk with its file system.

Usually, the disk is divided into blocks of fixed size to store the content of a file. When a byte is overwritten, only the (whole) block containing the byte is rewritten. When the file is appended, possibly the last block is rewritten, plus new blocks being allocated and "linked" to the file to store content of the appended data. These 2 operations are efficient under this scheme.

However, when some byte is inserted or deleted, usually, the file needs to be rewritten from the first block that contains the insert/delete change.

It is technically possible to reduce the time spent rewriting the file. For insert case, the OS can allocate a new block for the data inserted, "link" it to the file, and rewrite one or two neighbor blocks. For delete case, the OS can reduce the number of valid bytes in the blocks, rewrite the block, and/or "relink" the blocks in the file. As a disclaimer, the "solution" I just mentioned are dependent on how the specific file system is structured. However, if you reduce the cost of rewriting the file, usually, it will incur a hit on the read operations later, or internal fragmentation will occur.

Maybe the problem will be solved, if someone one day invented a data structure that allows efficient insertion/deletion of data on disk while maintaining the performance of read operation.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
0

You can modify a file in place; if the data you are replacing is the same size as the data you are replacing (smaller and you just pad). What if it is larger? Well, then everything after the modification would need to be rewritten to disk anyway, right? You also can't assume you have enough room at the end in all cases, so the file may need to be moved to a completely new location.

Obviously this makes the valid use cases rather rare. Normally it's not worth it unless you have a really good reason to do so. As for whether or not you have access to in-place modification API's in Python... I have no idea.

Ed S.
  • 122,712
  • 22
  • 185
  • 265
  • Python is very much like C in this regard, open the file with mode "r+" and use seek(), tell(), read(n) and write(). – thebjorn Jun 25 '12 at 06:10
0

Of course you can edit a file in place:

import struct
RECSIZE = struct.calcsize('<i')

fp = open('name.txt', 'r+b')
fp.seek(132 * RECSIZE)   # go to record number 132
fp.write(struct.pack('<i', 42))
fp.seek(132 * RECSIZE)
assert struct.unpack('<i', fp.read(RECSIZE)) == 42

this also shows the most common reason for doing so and the limitations.

thebjorn
  • 26,297
  • 11
  • 96
  • 138