27

Suppose I want to extend the built-in file abstraction with extra operations at open and close time. In Python 2.7 this works:

class ExtFile(file):
    def __init__(self, *args):
        file.__init__(self, *args)
        # extra stuff here

    def close(self):
        file.close(self)
        # extra stuff here

Now I'm looking at updating the program to Python 3, in which open is a factory function that might return an instance of any of several different classes from the io module depending on how it's called. I could in principle subclass all of them, but that's tedious, and I'd have to reimplement the dispatching that open does. (In Python 3 the distinction between binary and text files matters rather more than it does in 2.x, and I need both.) These objects are going to be passed to library code that might do just about anything with them, so the idiom of making a "file-like" duck-typed class that wraps the return value of open and forwards necessary methods will be most verbose.

Can anyone suggest a 3.x approach that involves as little additional boilerplate as possible beyond the 2.x code shown?

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
zwol
  • 135,547
  • 38
  • 252
  • 361
  • Perhaps instead of extending the file object you could make a custom object to be used with a `with` statement? – Waleed Khan Apr 18 '13 at 14:13
  • 1
    Why would a class which wraps the return value of `open` be so bad? You can override `__getattr__` to forward methods wholesale. – Benjamin Hodgson May 16 '14 at 06:04
  • @BenjaminHodgson It has been long enough that I don't remember exactly what I was thinking, but it was probably along the lines of "it would be easy to do 90% of the job and a giant pain to get every last corner case nailed down." I haven't done much with object introspection in Python, and when I've tried it's tripped me up in confusing ways. – zwol May 16 '14 at 12:57

3 Answers3

21

You could just use a context manager instead. For example this one:

class SpecialFileOpener:
    def __init__ (self, fileName, someOtherParameter):
        self.f = open(fileName)
        # do more stuff
        print(someOtherParameter)
    def __enter__ (self):
        return self.f
    def __exit__ (self, exc_type, exc_value, traceback):
        self.f.close()
        # do more stuff
        print('Everything is over.')

Then you can use it like this:

>>> with SpecialFileOpener('C:\\test.txt', 'Hello world!') as f:
        print(f.read())

Hello world!
foo bar
Everything is over.

Using a context block with with is preferred for file objects (and other resources) anyway.

poke
  • 369,085
  • 72
  • 557
  • 602
  • In response to the bounty request: You really shouldn’t subtype the file objects; the different file types are never created directly but instead are created by some openers (e.g. `open`, `urlopen`, …). If you want to extend the functionality, you should provide a custom *opener* that wraps the file-like object and provides the additional functionality. – poke May 23 '14 at 05:00
17

tl;dr Use a context manager. See the bottom of this answer for important cautions about them.


Files got more complicated in Python 3. While there are some methods that can be used on normal user classes, those methods don't work with built-in classes. One way is to mix-in a desired class before instanciating it, but this requires knowing what the mix-in class should be first:

class MyFileType(???):
    def __init__(...)
        # stuff here
    def close(self):
        # more stuff here

Because there are so many types, and more could possibly be added in the future (unlikely, but possible), and we don't know for sure which will be returned until after the call to open, this method doesn't work.

Another method is to change both our custom type to have the returned file's ___bases__, and modifying the returned instance's __class__ attribute to our custom type:

class MyFileType:
    def close(self):
        # stuff here

some_file = open(path_to_file, '...') # ... = desired options
MyFileType.__bases__ = (some_file.__class__,) + MyFile.__bases__

but this yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __bases__ assignment: '_io.TextIOWrapper' deallocator differs from 'object'

Yet another method that could work with pure user classes is to create the custom file type on the fly, directly from the returned instance's class, and then update the returned instance's class:

some_file = open(path_to_file, '...') # ... = desired options

class MyFile(some_file.__class__):
    def close(self):
        super().close()
        print("that's all, folks!")

some_file.__class__ = MyFile

but again:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __class__ assignment: only for heap types

So, it looks like the best method that will work at all in Python 3, and luckily will also work in Python 2 (useful if you want the same code base to work on both versions) is to have a custom context manager:

class Open(object):
    def __init__(self, *args, **kwds):
        # do custom stuff here
        self.args = args
        self.kwds = kwds
    def __enter__(self):
        # or do custom stuff here :)
        self.file_obj = open(*self.args, **self.kwds)
        # return actual file object so we don't have to worry
        # about proxying
        return self.file_obj
    def __exit__(self, *args):
        # and still more custom stuff here
        self.file_obj.close()
        # or here

and to use it:

with Open('some_file') as data:
    # custom stuff just happened
    for line in data:
        print(line)
# data is now closed, and more custom stuff
# just happened

An important point to keep in mind: any unhandled exception in __init__ or __enter__ will prevent __exit__ from running, so in those two locations you still need to use the try/except and/or try/finally idioms to make sure you don't leak resources.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • 1
    Thanks for clarifying this with me. In my particular case I know the files will always be `io.TextIOWrapper`'s. Is it considerable to extend the TextIOWrapper class? And if I can extend that and use that extension, where do I handle the case of receiving not a text file? – ThorSummoner May 22 '14 at 16:47
  • Are you in control of where these files are opened, or are you also wanting to track files opened by other modules? – Ethan Furman May 22 '14 at 17:52
  • I control the entire file open operation. A middle-step I want to achieve is to add a method to do a hashsum of the file. – ThorSummoner May 22 '14 at 20:22
  • The context manager is definitely the clearest way. But playing advocate of the devil: Could one do it the ruby/node.js way and monkypatch `io.__metaclass__`? – Chris Wesseling May 23 '14 at 06:13
  • @ChrisWesseling: Nope. The metaclass of `_io.TextIOWrapper` is `type`, which cannot be modified; also, one cannot modify the instance's class' class: `TypeError: can't set attributes of built-in/extension type '_io.TextIOWrapper'`. – Ethan Furman May 23 '14 at 06:58
  • Why are you subclassing `_io.TextIOWrapper` rather than `io.TextIOWrapper`? – gerrit Oct 15 '15 at 17:44
  • For me calling `MyTextIOFile(...)` fails with `AttributeError: 'str' object has no attribute 'readable'`. – gerrit Oct 15 '15 at 17:50
  • @gerrit: 1) I don't remember. ;) 2) which version of Python, exactly how are you calling MyTextIOFile, and exact traceback? You may want to ask a new question. – Ethan Furman Oct 15 '15 at 17:59
  • @EthanFurman [Here is my new question](http://stackoverflow.com/q/33155741/974555). – gerrit Oct 15 '15 at 18:19
  • Unfortunately, a context manager is not feasible if you want to return the file object to a caller. – Torsten Bronger May 13 '20 at 17:27
8

I had a similar problem, and a requirement of supporting both Python 2.x and 3.x. What I did was similar to the following (current full version):

class _file_obj(object):
    """Check if `f` is a file name and open the file in `mode`.
    A context manager."""
    def __init__(self, f, mode):
        if isinstance(f, str):
            self.file = open(f, mode)
        else:
            self.file = f
        self.close_file = (self.file is not f)
    def __enter__(self):
        return self
    def __exit__(self, *args, **kwargs):
        if (not self.close_file):
            return  # do nothing
        # clean up
        exit = getattr(self.file, '__exit__', None)
        if exit is not None:
            return exit(*args, **kwargs)
        else:
            exit = getattr(self.file, 'close', None)
            if exit is not None:
                exit()
    def __getattr__(self, attr):
        return getattr(self.file, attr)
    def __iter__(self):
        return iter(self.file)

It passes all calls to the underlying file objects and can be initialized from an open file or from a filename. Also works as a context manager. Inspired by this answer.

Community
  • 1
  • 1
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
  • This does not address the question asked; Providing a wrapper that handles both python2 and python3 is distracting as extending the python2 file object is quite trivial already and this question was asked because doing the same in python3 is apparently no long trivial. – ThorSummoner May 16 '14 at 06:07
  • 6
    @ThorSummoner I think "does more than asked for" is not the same as "does not address the question". Given that there was already an answer, I thought it wouldn't harm to add an extended version. The answer explains what additional functionality the code has. Thanks for the feedback though. – Lev Levitsky May 16 '14 at 15:21
  • I found this suggestion useful. – Rick Dec 09 '15 at 20:15