0

I have a python function that read a file the way below:


def parse(filename, position):
    with open(filename,'r') as file:
        for index, line in enumerate(islice(file,position,None), 1):
            ... do something
            print line
    file.close()
    return index

This function is called every time another application, that is out of my control, write some data to the file, this way I try to avoid reading and write simultaneously, because I don't want the file to be corrupted. Anyway there is no guarantee read and write will not occur simultaneously.

What I already know is:

  1. If simultaneous read/write occur it is possible that reading function don't read the most updated information, like in this question, and that is fine for me because the purpose is to get information from file without corrupting it.
  2. It is possible to lock the file for reading so the writing application will not write while I read. But I don't need this because, as said, there is no problem in reading old data.

What I want to know is:

  1. Can this reading function, the way it is, corrupt the file?
  2. If no, what guarantee that file will not be corrupted?.
  3. There is any other possible problem/inconsistency in this read/write scenario?
Community
  • 1
  • 1
  • depends on what you mean by corruption. e.g. consider the case where you're reading line-by-line. if the lines have varying lengths, and the file gets rewritten behind your script's back, the current file offset in your script will suddenly be pointing somewhere in the middle of a line, and you get a mis-read. so yes, lock the file while you're reading so it can't be modified. – Marc B Dec 01 '15 at 16:02
  • 2
    Since you're using `with open() as file` you don't need to use `file.close()` which will be called automatically after your operations finished. – albert Dec 01 '15 at 16:02
  • Can you get more information how the data is written to the file? A well behaved writer saves all the data to a temp file and then if everything is OK, it replaces the old file by the new file in one atomic step. That prevents corruption. – VPfB Dec 01 '15 at 16:14
  • `'r'` is the default mode. – Peter Wood Dec 01 '15 at 16:21
  • Reading a file will not corrupt it. The data you have read while another process was writing can be inconsistent or incomplete, with an exception: you can read file sequentially while another process is writing via append, and that is how "tail -f" UNIX command works. – Muposat Dec 01 '15 at 17:48
  • @MarcB by corruption I mean the file is corrupt data inside file, but I also wanna know if there is any other problems than this. – Renato Vieira Dec 01 '15 at 18:39
  • @Muposat can you give any reference from python documentation, windows/linux documentation or any other documentation? I've already search but no success. – Renato Vieira Dec 01 '15 at 18:42
  • 1
    @Renato Vieira my opinion is based on common sense + years of experience. I imagine that this falls under OS functionality that is too obvious to document. You can google "man tail" and look for "-f" option, this utility has seen billion uses and never corrupted anything. Unlike UNIX, Windows usually locks the file being updated, so you might run into a problem there. – Muposat Dec 02 '15 at 06:11

1 Answers1

0

if you look at the line

with open(filename,**'r'**) as file:

Here you are opening the file in readonly format, which means you can't write the file, try yourself writing the file, it'll throw exception saying that file only is opened for reading. Internally file object saves the mode you open, and whenever you call an operation it'll check the mode you opened the file, if it's not legit (opening for read and calling write is not legit), It'll throw exception right away.

Another thing is that, you are specifically calling

file.close()

which is no use, it's already closed when you are coming out of with loop, while comping out of with loop it will call file.exit() method, which will eventually close the file.

yugandhar
  • 580
  • 7
  • 16