0

Possible Duplicate:
Fastest Way to Delete a Line from Large File in Python
How to edit a line in middle of txt file without overwriting everything?

I know I can read every line into a list, remove a line, then write the list back.

But the file is large, is there a way to remove a part in the middle of the file, and needn't rewrite the whole file?

Community
  • 1
  • 1
user805627
  • 4,247
  • 6
  • 32
  • 43
  • For clarity, are you talking about Py2 or Py3? – Ikke Nov 05 '12 at 07:52
  • Added the specific tag to the question. – Ikke Nov 05 '12 at 07:54
  • If you remove a part of a file, subsequent data needs to move or leave empty space. There's no way around it except maybe low-level filesystem access might be able to strip a single fixed-size (typically 4k) block. – John Dvorak Nov 05 '12 at 07:54
  • 1
    See http://stackoverflow.com/questions/8868499/how-to-edit-a-line-in-middle-of-txt-file-without-overwriting-everything – Matthew Adams Nov 05 '12 at 07:55
  • 5
    And perhaps: http://stackoverflow.com/questions/2329417/fastest-way-to-delete-a-line-from-large-file-in-python – Nicolas Nov 05 '12 at 07:56

1 Answers1

6

I don't know if a way to change the file in place, even using low-level file system commands, but you don't need to load it into a list, so you can do this without a large memory footprint:

with open('input_file', 'r') as input_file:
    with open('output_file', 'w') as output_file:
        for line in input_file:
            if should_delete(line):
                pass
            else:
                output_file.write(line)

This assumes that the section you want to delete is a line in a text file, and that should_delete is a function which determines whether the line should be kept or deleted. It is easy to change this slightly to work with a binary file instead, or to use a counter instead of a function.

Edit: If you're dealing with a binary file, you know the exact position you want to remove, and its not too near the start of the file, you may be able to optimise it slightly with io.IOBase.truncate (see http://docs.python.org/2/library/io.html#io.IOBase). However, I would only suggest pursuing this if the a profiler indicates that you really need to optimise to this extent.

aquavitae
  • 17,414
  • 11
  • 63
  • 106