15

I have an application that reads lines from a file and runs its magic on each line as it is read. Once the line is read and properly processed, I would like to delete the line from the file. A backup of the removed line is already being kept. I would like to do something like

file = open('myfile.txt', 'rw+')
for line in file:
   processLine(line)
   file.truncate(line)

This seems like a simple problem, but I would like to do it right rather than a whole lot of complicated seek() and tell() calls.

Maybe all I really want to do is remove a particular line from a file.

After spending far to long on this problem I decided that everyone was probably right and this it just not a good way to do things. It just seemed so elegant solution. What I was looking for was something akin to a FIFO that would just let me pop lines out of a file.

Ryan White
  • 1,927
  • 2
  • 19
  • 32
  • Another way to look at this is I want to implement a file based FILO queue. – Ryan White Feb 08 '09 at 07:17
  • File-based queue: http://stackoverflow.com/questions/366533/how-can-i-run-the-first-process-from-a-list-of-processes-stored-in-a-file-and-imm – jfs Feb 09 '09 at 06:04

7 Answers7

20

Remove all lines after you've done with them:

with open('myfile.txt', 'r+') as file:
    for line in file:
        processLine(line)
    file.truncate(0)

Remove each line independently:

lines = open('myfile.txt').readlines()

for line in lines[::-1]: # process lines in reverse order
    processLine(line)
    del lines[-1]  # remove the [last] line

open('myfile.txt', 'w').writelines(lines)

You can leave only those lines that cause exceptions:

import fileinput, sys
    
for line in fileinput.input(['myfile.txt'], inplace=1):
    try: processLine(line)
    except Exception:
         sys.stdout.write(line) # it prints to 'myfile.txt'

In general, as other people already said it is a bad idea what you are trying to do.

Mehvix
  • 296
  • 3
  • 12
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • That will truncate the whole file. – Ryan White Feb 08 '09 at 07:16
  • @Ryan: Yes, it will. Exactly as you asked it. When you've done with *all* lines *you*'d like to remove them *all*. Clarify your question if it is not so. – jfs Feb 08 '09 at 07:21
  • 1
    No where in my post did I say *all* My question is to truncate A line – Ryan White Feb 08 '09 at 07:22
  • Your edit is an interesting approach. Seems like everyone hates what I'm trying to do, but there is a very valid use case for this I swear. – Ryan White Feb 08 '09 at 07:28
  • 1
    The second approach will not work as expected, since after deleting 1 line from the list, the indexes will be changed. The iterator i will remain as the original one, but the next del statement will delete index i from the new modified list – gopi1410 Apr 29 '19 at 05:54
  • @jfs yes, I just noticed that. But for some reason, its not working in python 3. Here's a simplified snippet using hardcoded lists: https://trinket.io/python/a775825d03 – gopi1410 Apr 29 '19 at 06:07
  • 1
    @gopi1410: your original comment is correct. `del lines[i]` won't work as expected if you need to delete more than one line. I've updated the answer – jfs Apr 30 '19 at 14:41
10

You can't. It is just not possible with actual text file implementations on current filesystems.

Text files are sequential, because the lines in a text file can be of any length. Deleting a particular line would mean rewriting the entire file from that point on.

Suppose you have a file with the following 3 lines;

'line1\nline2reallybig\nline3\nlast line'

To delete the second line you'd have to move the third and fourth lines' positions in the disk. The only way would be to store the third and fourth lines somewhere, truncate the file on the second line, and rewrite the missing lines.

If you know the size of every line in the text file, you can truncate the file in any position using .truncate(line_size * line_number) but even then you'd have to rewrite everything after the line.

nosklo
  • 217,122
  • 57
  • 293
  • 297
  • interesting, I was not quite looking for being able to delete a random line from a file. That would be very difficult. More like truncating the file (at the beginning or end) as it was read. – Ryan White Oct 03 '09 at 17:39
  • This doesn’t answer the question. Random access deletion of lines was not part of the problem (neither explicitly nor implied). – Guildenstern Mar 26 '19 at 23:44
  • @Guildenstern check the question edit history and the answer timestamp. – nosklo Mar 27 '19 at 01:08
6

You're better off keeping a index into the file so that you can start where you stopped last, without destroying part of the file. Something like this would work :

try :
    for index, line in enumerate(file) :
        processLine(line)
except :
    # Failed, start from this line number next time.
    print(index)
    raise
sykora
  • 96,888
  • 11
  • 64
  • 71
4

Truncating the file as you read it seems a bit extreme. What if your script has a bug that doesn't cause an error? In that case you'll want to restart at the beginning of your file.

How about having your script print the line number it breaks on and having it take a line number as a parameter so you can tell it which line to start processing from?

Imran
  • 12,950
  • 8
  • 64
  • 79
  • The file would only be truncated when the operation on the line is complete. I will write this data to a backup file as well ... but you didn't really answer the question. – Ryan White Feb 08 '09 at 06:59
4

First of all, calling the operation truncate is probably not the best pick. If I understand the problem correctly, you want to delete everything up to the current position in file. (I would expect truncate to cut everything from the current position up to the end of the file. This is how the standard Python truncate method works, at least if I Googled correctly.)

Second, I am not sure it is wise to modify the file while iterating on in using the for loop. Wouldn’t it be better to save the number of lines processed and delete them after the main loop has finished, exception or not? The file iterator supports in-place filtering, which means it should be fairly simple to drop the processed lines afterwards.

P.S. I don’t know Python, take this with a grain of salt.

zoul
  • 102,279
  • 44
  • 260
  • 354
3

A related post has what seems a good strategy to do that, see How can I run the first process from a list of processes stored in a file and immediately delete the first line as if the file was a queue and I called "pop"?

I have used it as follows:

  import os;

  tasklist_file = open(tasklist_filename, 'rw');  
  first_line = tasklist_file.readline();
  temp = os.system("sed -i -e '1d' " + tasklist_filename); # remove first line from task file;

I'm not sure it works on Windows. Tried it on a mac and it did do the trick.

Community
  • 1
  • 1
W7GVR
  • 1,990
  • 1
  • 18
  • 24
2

This is what I use for file based queues. It returns the first line and rewrites the file with the rest. When it's done it returns None:

def pop_a_text_line(filename):
    with open(filename,'r') as f:
        S = f.readlines()
    if len(S) > 0:
        pop = S[0]
        with open(filename,'w') as f:
            f.writelines(S[1:])
    else:
        pop = None
    return pop