0

I am reading content from file line by line. Once line processed, I clear it out. Here is the code

import os
lines = open('q0.txt').readlines()

for i, line in enumerate(lines[:]):
    print line
    flag = raw_input()
    print lines[i]
    del lines[i]

open('q0.txt', 'w').writelines(lines)

I am going through large q0.txt. My intension is, if there is any intruption in between, I should not reprocess previously processed lines again.

In above code, though I delete lines[i], it still remain in file. What is wrong?

puncrazy
  • 349
  • 2
  • 14
  • 1
    You are deleting a line from your list, not a file. Also i think you cant delete lines from a file dynamically. You need to create new one containing all lines of interest. – Marcin Dec 01 '14 at 05:44
  • 2
    Is that indentation right? I'm asking because `raw_input` will wait until a user inputs something. You should never see the second `print lines[i]`, much less reaching the `writelines` in the end of your code unless you keep pressing `Enter` on your keyboard. – Savir Dec 01 '14 at 05:51
  • @BorrjaX: there was indentation issue while pasting code here. corrected – puncrazy Dec 01 '14 at 05:55
  • Nopes, still happening... :-) – Savir Dec 01 '14 at 05:58
  • @BorrajaX: yeah correct, 2nd `print line[]` does not get printed. But as first `print line` works it will not cause any issue for me. – puncrazy Dec 01 '14 at 06:05
  • It does, since it's gonna block your whole program... – Savir Dec 01 '14 at 06:10
  • @BorrajaX: yes, there index out of boudn ossue to as sriram explained. any solution for this? – puncrazy Dec 01 '14 at 06:15

2 Answers2

2

I expect the above code to throw an IndexError somewhere.

Why? Let us say your script reads a 100 line file. lines[:] will have 100 lines in it. Meanwhile, del lines[i] will continue deleting items.

Eventually, the for loop will reach 100th element. If there is, even one single del operation, del lines[99] will fail and throw an IndexError.

Therefore, the lines open('q0.txt', 'w').writelines(lines) will never get executed when there is a deleted. And, hence, the file continue to remain the same.

This is my understanding.

Sriram
  • 513
  • 3
  • 15
1

Since raw_input is blocking your code, you might wanna separate the process in two threads: the main one and one that you create in your code. Since threads run concurrently and in an unpredictable order (kinda), you're not gonna be able to control exactly on what line the interruption is gonna reach your main while loop). Threads are a very tricky part to get right and it requires a lot of reading, testing and checking why things happen the way they happen...

Also, since you don't mind consuming your lines, you can do what's called a destructive read: Load the contents of the file into a lines variable, and keep getting the last one with pop() until you run out of lines to consume (or the flag has been activated). Check what a pop() method does in a list. Be aware that pop() always returns the last item of a list. If you want the items printed in the original order, you have to use shift or pop from a previously reversed list.

import threading

interrupt=None

def flag_activator():
    global interrupt
    interrupt = raw_input("(!!) Type yes when you wanna stop\n\n")
    print "Oh gosh! The user input %s" % interrupt

th = threading.Thread(target=flag_activator)
th.start()


fr = open('q0.txt', 'r')
lines = fr.readlines()
fr.close()

while lines and interrupt != 'yes':
    print "I read this line: %s" % lines.pop()

if len(lines) > 0:
    print "Crap! There are still lines"
    fw = open('q0.txt', 'w')
    fw.writelines(lines)
    fw.close()

Now, that code is gonna block your terminal until you type yes on the terminal.

PS: Don't forget to close your opened files (if you don't want to call close() explicitly, see the with statement here and here)

EDIT (as per OP's comments to my misunderstanding):

If what you want is to ensure that the file will not contain the already processed line if your script suddenly stops, an inefficient (but straightforward) way to accomplish that is:

  1. Open the file for read and write (you're gonna need a different file descriptor for each operation)
  2. Load all the file's lines into a variable
  3. Process the first line
  4. Remove that line from the list variable
  5. Write the remaining list to the file
  6. Repeat until no more lines are loaded.

All this opening/closing of files is really, really inefficient, though, but here it goes:

done = False
while done == False:
    with open("q0.txt", 'r') as fr, open("q0.txt", 'w') as fw:
        lines = fr.readlines()
        if len(lines) > 0:
            print lines[0] # This would be your processing
            del lines[0]
            fw.writelines(lines)
        else:
            done = True
Community
  • 1
  • 1
Savir
  • 17,568
  • 15
  • 82
  • 136
  • dude thanks for your efforts, but it behave away from what I want – puncrazy Dec 01 '14 at 06:34
  • I put raw_input just to check whether my program deleted the line inbetween or not – puncrazy Dec 01 '14 at 06:36
  • LoL... I imagined **:-D** But meh... Theads are something worth knowing about. – Savir Dec 01 '14 at 06:36
  • @puncrazy: The `pop` (or `shift`) parts can still be useful, though... You won't have the `IndexError` with them. – Savir Dec 01 '14 at 06:41
  • 1
    Ok! This got personal!! **:-D** (I was curious, myself)... I've edited the answer. I think it should fit your question better? – Savir Dec 01 '14 at 07:07