0

I'm trying to read a file in Python using Win32Api so as to be able to open the file without locking it on a Windows system.

I've been able to open the file and even to read from it but when I try to implement the iterator protocol I get an error message that I can't understand.

Here's an example script that reproduce the problem

#!/usr/bin/env python

import os


class FileTail(object):
    def __init__(self, file):
        self.open(file)

    def open(self, file):
        """Open the file to tail and initialize our state."""
        fh = None

        import win32file
        import msvcrt

        handle = win32file.CreateFile(file,
                                      win32file.GENERIC_READ,
                                      win32file.FILE_SHARE_DELETE |
                                      win32file.FILE_SHARE_READ |
                                      win32file.FILE_SHARE_WRITE,
                                      None,
                                      win32file.OPEN_EXISTING,
                                      0,
                                      None)
        file_descriptor = msvcrt.open_osfhandle(
            handle, os.O_TEXT | os.O_RDONLY)

        fh = open(file_descriptor, encoding='utf-8',
                  errors='ignore', newline="\n")

        self.reopen_check = "time"

        self.fh = fh
        self.file = file

        # Uncommenting this code demonstrate that there's no problem reading the file!!!!
        # -------------------------------------------------------------------------------
        # line = None
        # self.wait_count = 0

        # while not line:
        #     line = self.fh.readline()

    def __iter__(self):
        return self

    def __next__(self):
        line = None
        self.wait_count = 0

        while not line:
            line = self.fh.readline()

        return line

# ##############################
# ENTRY POINT
# ##############################
if __name__ == "__main__":
    my_file = FileTail('C:\LOGS\DANNI.WEB\PROVA.LOG')

    for line in my_file:
        print(line)

Now, if you try to execute this script, you will receive this error message:

Traceback (most recent call last):
  File "C:\Users\me\Desktop\prova.py", line 63, in <module>
    for line in my_file:
  File "C:\Users\me\Desktop\prova.py", line 53, in __next__
    line = self.fh.readline()
OSError: [Errno 9] Bad file descriptor

If I uncomment the commented code in the "open" method I can read the whole file, so I don't think the problem is in the usage of the win32 api to open the file... so... what I'm missing?

Why using the iterator protocol I get the error message? Is it a thread related problem? How can I fix it?

I know that there will be probably a thousand of work-around but I want to understand why this code is not working...

Thank you all for the help you will provide and sorry for my very bad english... :(

Dave

mastro35
  • 133
  • 9
  • Just a guess here: can you try to store `handle` and `file_descriptor` from your `open` method as instance attributes? It seems like the gc is freeing those handles. That's why it works when you read the file inside the same method. – Wombatz Oct 23 '16 at 13:31
  • Wombatz... THANK YOU!!! You are right, simply storing the handle in an instance attribute make it works like a charm, thanks! Ps if you answer I will approve your as the correct answer. – mastro35 Oct 23 '16 at 13:38
  • Are you implying that a standard `open(path, 'r')` locks the file on Windows? – Jonathon Reinhart Oct 23 '16 at 14:03
  • Well... yes... at least... it prevents the file to be deleted by another process. I've experienced this problem and checking on Stack Overflow I've found this solution of the win32 API. Check it here: http://stackoverflow.com/questions/14388608/python-opening-a-file-without-creating-a-lock – mastro35 Oct 23 '16 at 14:09
  • Python 3 uses binary mode; file descriptors passed to `open` *should not* be opened in text mode (`O_TEXT`). Also, why aren't you using the `opener` parameter? – Eryk Sun Oct 23 '16 at 17:46
  • I just checked, and verified in the source, that setting `O_TEXT` on the file descriptor won't work because `FileIO.__init__` (from `open`) resets the file descriptor to binary mode. So `newline="\n"` isn't working with Windows CRLF lin endings. Use the default `newline` translation. – Eryk Sun Oct 23 '16 at 18:08

1 Answers1

0

The problem is that the objects handle and file_descriptor might get garbage collected after the function open returns. When you call __next__ the objects might have been freed which raises the OSError: [Errno 9] Bad file descriptor. That's also why it works when you read the file in the open function itself, because there the objects are still present.

To solve this simply store the objects as instance attributes so there is at least one reference to them.

def open(...)
    ...
    self.handle = CreateFile(...)
    ...
    self.file_descriptor = msvcrt.open_osfhandle(self.handle, ...)
    ...
    self.fh = open(self.file_descriptor, ...)
    ...

It might be sufficient to only store one of them but I am not sure which one. Storing both is the save way.

Wombatz
  • 4,958
  • 1
  • 26
  • 35
  • `file_descriptor` is just an integer. Also, the proper way to deal with `handle` is to call its `Detach` method after calling `open_osfhandle`, not to keep a reference to it. Otherwise you have a race condition as to which will close the handle first. Windows reuses handles, so one of them could end up closing the handle for a completely unrelated kernel object (another file, thread, process, event, semaphore, etc). – Eryk Sun Oct 23 '16 at 18:13
  • @eryksun you should create an answer for that. Then i will delete this one and op can accept yours. – Wombatz Oct 23 '16 at 21:24