0

I have a simple program that processes some lines in a text file (adds some text to them). But then it saves them to another file. I would like to know if you can remove the line after the line is processed in the loop. Here is a example of how my program works:

datafile  = open("data.txt", "a+")
donefile = open("done.txt", "a+")
for i in datafile:
    #My program goes in here
    donefile.write(processeddata)
    #end of loop
datafile.close()
donefile.close()

As you can see, it just processes some lines from a file (separated by a newline). Is there a way to remove the line in the end of the loop so that when the program is closed it can continue where it left off?

David B
  • 61
  • 8
Uber
  • 331
  • 1
  • 5
  • 18

1 Answers1

1

Just so that I get the question right- you'd like to remove the line from datafile once you've processed and stored it in donefile ?

There is no need to do this and its also pretty risky to write to a file which is your source of read.

Instead , why not delete the donefile after you exit the loop? (i.e. after you close your files)

file iterator is a lazy iterator. So when you do for i in datafile it loads one line into memory at a time, so you are only working with that one line...so memory constraints shouldn't be of your concern

Lastly, to access files, please consider using with statement. It takes care of file handle exceptions and makes your program more robust

labheshr
  • 2,858
  • 5
  • 23
  • 34
  • Hi, Well I can only do one per time because it also requests some data from a url. But I want to remove that line because I use very big files that take a lot of time. And yes, that's pretty much what I would like to do. Remove the file from datafile after it has been saved. – Uber Oct 22 '15 at 18:37
  • When you access a single line via for i in datafile, you are NOT loading the entire file in memory, b/c file iterator is lazy in nature...I would not recommend simultaneously deleting lines from a file while you are iterating over them...also what exactly takes a "lot of time"? I can understand that having 2 big files can take a lot of space, but don't see a time inefficiency in your code.... – labheshr Oct 22 '15 at 18:40
  • THe api that I use to collect some data is limited to 20 requests per minute, when running a 100k line file, that takes a while to do. And loading it totally in memory isn't needed, right? – Uber Oct 22 '15 at 18:42
  • correct, the call you do: open(file...) does not load the entire file in memory. it only loads the file handle (think of it as file pointer). So whether the file is 5 MB or 5000 MB it does not really matter. When you do the for i in datafile, the iterator is lazy meaning...it loads a single line, you process it and then in the next iteration of the file, it discards the previously loaded line and loads the next one.... – labheshr Oct 22 '15 at 18:46
  • Ahh okay, well I use this program on a remote desktop server and it's really not reliable... It crashed twice this week, So that's why I want to remove that line after it has been saved. So is there a way to remove it from the file? – Uber Oct 22 '15 at 18:48
  • look at this topic: http://stackoverflow.com/questions/525272/python-truncate-lines-as-they-are-read ...most people there recommend you shouldnt be deleting as you read the lines one by one....as it messes up the file index... – labheshr Oct 22 '15 at 18:53
  • Ohh okay so it's not possible :(, anyway is it maybe possible to lie compare the donefile with the datafile? so that it doesn't process text that's already in the donefile? – Uber Oct 22 '15 at 18:57
  • Sorry, I meant like. So My datafile looks like: Data and the donefile looks like data:item. So can I use something like if the data is in the donefile already, it doesn't go through the for loop and just skip it? – Uber Oct 22 '15 at 19:00
  • 1
    dont think so :) instead you can do the following...iterate over datafile in for loop, using try except..in try increment the line number as you read them , and in except print the line number. that way if you fail on line 25, except will print that and you know that next time, you have to start from line 25 onward...that way you are not going over first 25 lines... – labheshr Oct 22 '15 at 19:02
  • That's pretty smart! thank you. It's kinda weird that's it's not possible to remove a line from a file after processing it. I have seen it done in other programs. Anyway, that caches a 32MB chunk of it. Would I also be able to do it lie that? – Uber Oct 22 '15 at 19:08
  • some people have what you want using sed in *nix systems...i dont understand your last question..."caches 32 mb chunk of it ...would I also be able to do it lie that?" can you explain more clearly? – labheshr Oct 22 '15 at 19:19
  • Well, I will be using the program in windows. Anyway I meant by that is that you said that it was better when it caches the file in memory instead of doing things one by one. So I thought maybe that's why I can't remove the line. – Uber Oct 22 '15 at 19:21
  • I mentioned that it does not load the entire file in memory, which is correct – labheshr Oct 22 '15 at 19:47
  • Well, thank you, I will use the line counter idea of yours. Thanks! – Uber Oct 22 '15 at 19:52