0

I have a scenario to remove the first line of the file (large file around 70 GB) using seek in Python. Also I can't write the data to another file. I need to remove from the existing file only. Is there any solution.

Tried seek to move the pointer to end of the line but not sure how to remove it.

Timus
  • 10,974
  • 5
  • 14
  • 28
  • 1
    No. You cannot delete data from a file without rewriting the file. That's just a fact. If you delete the first line, then EVERY byte of the file will be at a new location. – Tim Roberts May 10 '23 at 23:09
  • Does this answer your question? [How to modify a text file?](https://stackoverflow.com/questions/125703/how-to-modify-a-text-file) – Ali Ent May 10 '23 at 23:10
  • Depending on the filesystem, perhaps it may be theoretically possible to move the file pointer. But this is unlikely for modern filesystems due to the block structure. – Mateen Ulhaq May 10 '23 at 23:18
  • 2
    "Also i can't write the data to another file. I need to remove from the existing file only." Files don't work this way. It's not a matter of the programming language. – Karl Knechtel May 10 '23 at 23:19
  • Also - if you happen to be working with text files this large, it is about time to rethink the way you are storing and retrieving data. This seems practical for no-thing. – jsbueno May 11 '23 at 11:45

2 Answers2

0

You can memory map the file to the contents of the file appear in memory, then move the memory starting from the 2nd line to the beginning of the file. Then truncate the file to the new file length.

This won't likely be fast for a 70GB file. It still has to flush the file changes back to disk. That's just the way files work, but it won't require an additional 70GB of disk space such as the usual process of writing a new file and deleting the old one.

import mmap

# Create test file for demonstration (about 50MB)
#
# The quick brown fox jumped over 1 lazy dogs
# The quick brown fox jumped over 2 lazy dogs
# ...
# The quick brown fox jumped over 1,000,000 lazy dogs

with open('test.txt', 'w') as f:
    for i in range(1, 1_000_001):
        print(f'The quick brown fox jumped over {i:,} lazy dogs', file=f)

# Create memory-mapped file, read first line, shift file memory
# starting from offset of the 2nd line back to the beginning of the file.
# This removes the first line.
with open('test.txt', 'r+b') as f:
    with mmap.mmap(f.fileno(), 0) as mm:
        size = mm.size()
        line = mm.readline()
        linelen = len(line)
        mm.move(0, linelen, size - linelen)
        mm.flush()

    # Truncate the file to the shorter length.
    f.truncate(size - linelen)

# Read the first line of the new file.
with open('test.txt') as f:
    print(f.readline())

Output:

The quick brown fox jumped over 2 lazy dogs
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
-1

It's impossible unfortunately to delete it instantly, but uou can try this code. This will basically rewrite the content in the same file except for the first line:

import fileinput

with fileinput.input(files=('text.txt'), inplace=True) as f:
    for line_number, line in enumerate(f):
       if line_number == 0:
           continue
       print(line, end='')

The inplace=True argument tells Python to modify the file in place, rather than creating a new file.

  • `inplace=True` makes a backup for reading and sends the new output to the same filename. It's not simply overwriting the file. – Mark Tolonen May 11 '23 at 00:10
  • I said it in the top, it's impossible to make it instantly, it's just the same idea as creating a new file . Using fileinput module with inplace=True allows you to make changes to the file without creating a new file. The inplace flag tells the module to write the output back to the input file, effectively replacing the original content with the modified content. – Bachagha Mousaab May 11 '23 at 00:39
  • OP says "Also i can't write the data to another file" which is what this is doing behind the scenes. Perhaps OP doesn't have another 70GB to spare. – Mark Tolonen May 11 '23 at 02:55
  • even if this "inplace" works as you are describing, your code has no examples of writting the data back - or how one can continuosly read from the file at the "old" position and write at the new one. – jsbueno May 11 '23 at 11:47