-1

I'm trying to delete a line just after reading it in python.

with open("pages_Romance") as f:
   for line in f:
      print "Page: " + line
      #Do something with the line
      delete_a_line("pages_Romance", line)

My function delete_a_line is implemented like:

def delete_a_line(path_file, line):
    with open(path_file, "r") as f:
       urls = f.readlines()
       if len(urls) == 1:
          print "File " + path_file + " deleted"
          os.remove(path_file)
       else:
          with open(path_file, "w") as f:
             for url in urls:
                if url != line:
                    f.write(url)
                else:
                    print url

My file pages_Romance contains 200 URLs (one by line) and each time I read an URL I want to delete it. The problem is each time I launch the script I got the same problem at the same place the URL number 163 in my file is cut and then the script stop. It works well if I got less that 163 URLs but if I got 163 or more URL I will get the following output:

Page: http://www.allocine.fr/films/genre-130

Then the script stop. I should have:

Page: http://www.allocine.fr/films/genre-13024/?page=163

I you guys could help me figure out this problem. If you want you can try this script, it will create the file with 200 URLs:

def create_url_file():
    with open("pages_Romance", "w") as f:
        for i in range(1,201):
            f.write("http://www.allocine.fr/films/genre-13024/?page=" + str(i) + "\n")
mel
  • 2,730
  • 8
  • 35
  • 70
  • 2
    Deleting a line from a file seems weird to me. You could always just do `urls = [line.rstrip() for line in file]`, and then simply pull the urls out one by one with `urls.pop(0)`... if you felt the need to do that. – Wayne Werner Mar 02 '16 at 16:32
  • 2
    To know why it stops at 163 we would need to know what `#Do something with the line` actually does. But in general you shouldn't be opening the same file 3 times and trying to read and write to it with 3 different file handlers at the same time. – Stuart Mar 02 '16 at 16:34
  • It's for crawling a website and I want to be able to relaunch the crawling process in case of connexion problem – mel Mar 02 '16 at 16:34
  • @Stuart it's currently just a comment. I do nothing yet but I have the problem. – mel Mar 02 '16 at 16:35
  • Similar question here http://stackoverflow.com/questions/4710067/deleting-a-specific-line-in-a-file-python – Stuart Mar 02 '16 at 16:47
  • @Stuart I know I used that trick, but in my case it doesn't seem to work well – mel Mar 02 '16 at 16:49
  • 1
    @mel Did something like this all the time. If you wrap critical code in try...catch while tracking what line you are at, as part of handling the except you can save the last successful line # to file, debug, and restart where you left off after retrieving the progress data. In my case I used sqlite as the software equivalent of a notepad to track progress. – David Mar 02 '16 at 16:56

3 Answers3

3

Deleting a line from a file stored on disk is not easy. Most solutions - like your attempt - actually involve reading the whole file in to memory (either line by line or all at once) and then writing it all back to disk again except for the line to be removed.

So a more natural way to do this would be to write the lines you want to keep to a new file at the same time as you iterate through and process the lines. You can then delete the old file and replace it with the new file as needed. This avoids reading the whole file into memory.

with open("pages_Romance") as in_file, open("pages_Romance_temp", "w") as out_file:
   for line in in_file:
      print "Page: " + line
      #Do something with the line
      if delete_this_line == False:
          out_file.write(line)

However if your file is short, consider just reading it all into memory and dealing with it as an array of lines, which may simplify your other code.

with open("pages_Romance") as f:
    urls = f.readlines()

# Do stuff with urls
urls.remove(unwanted_line)
# etc.

with open("pages_Romance", "w") as f:
    f.writelines(urls)
Stuart
  • 9,597
  • 1
  • 21
  • 30
2

I suspect you are iterating over a file that you are simultaneously changing. Your outer loop opens the file, your inner loop alters the length of the file. Try iterating only from your top level function.

1

While I don't know why it breaks after working for 163 lines, it is probably because you have have are changing the file in delete_a_line while it is still open in the original with block. I was able to get it to work by opening and closing the file in each outer iteration before calling delete_a_line, so the file is never opened in two places at once:

f = open("pages_Romance")
while f:
    line = f.readline()
    print "Page: " + line
    #Do something with the line
    f.close()
    delete_a_line("pages_Romance", line)
    try:
        f = open("pages_Romance")
    except IOError:
        f = None

Also, delete_a_line fails to delete the file itself if it is empty, because it is still open (you are trying to delete it from the with block). A quick fix is to set a flag and then delete the file outside of the with block:

def delete_a_line(path_file, line):
    delete_flag = False
    with open(path_file, "r") as f:
        urls = f.readlines()
        if len(urls) == 1:
            delete_flag = True
        else:
            with open(path_file, "w") as f:
                for url in urls:
                    if url != line:
                        f.write(url)
                    else:
                        print url
    if delete_flag:
        print "File " + path_file + " deleted"
        os.remove(path_file)

However, I agree with the others, and I'd try a different approach to the problem you are trying to solve than constantly deleting single lines from the file. The solution I outlined above is very inefficient.

dbc
  • 677
  • 8
  • 21