118

File-like objects are objects in Python that behave like a real file, e.g. have a read() and a write method(), but have a different implementation from file. It is a realization of the Duck Typing concept.

It is considered a good practice to allow a file-like object everywhere where a file is expected so that e.g. a StringIO or a Socket object can be used instead a real file. So it is bad to perform a check like this:

if not isinstance(fp, file):
   raise something

What is the best way to check if an object (e.g. a parameter of a method) is "file-like"?

dmeister
  • 34,704
  • 19
  • 73
  • 95

10 Answers10

126

For 3.1+, one of the following:

isinstance(something, io.TextIOBase)
isinstance(something, io.BufferedIOBase)
isinstance(something, io.RawIOBase)
isinstance(something, io.IOBase)

For 2.x, "file-like object" is too vague a thing to check for, but the documentation for whatever function(s) you're dealing with will hopefully tell you what they actually need; if not, read the code.


As other answers point out, the first thing to ask is what exactly you're checking for. Usually, EAFP is sufficient, and more idiomatic.

The glossary says "file-like object" is a synonym for "file object", which ultimately means it's an instance of one of the three abstract base classes defined in the io module, which are themselves all subclasses of IOBase. So, the way to check is exactly as shown above.

(However, checking IOBase isn't very useful. Can you imagine a case where you need to distinguish an actual file-like read(size) from some one-argument function named read that isn't file-like, without also needing to distinguish between text files and raw binary files? So, really, you almost always want to check, e.g., "is a text file object", not "is a file-like object".)


For 2.x, while the io module has existed since 2.6+, built-in file objects are not instances of io classes, neither are any of the file-like objects in the stdlib, and neither are most third-party file-like objects you're likely to encounter. There was no official definition of what "file-like object" means; it's just "something like a builtin file object", and different functions mean different things by "like". Such functions should document what they mean; if they don't, you have to look at the code.

However, the most common meanings are "has read(size)", "has read()", or "is an iterable of strings", but some old libraries may expect readline instead of one of those, some libraries like to close() files you give them, some will expect that if fileno is present then other functionality is available, etc. And similarly for write(buf) (although there are a lot fewer options in that direction).

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 3
    Finally, someone is keep it real. – Anthony Rutledge May 09 '17 at 23:53
  • 44
    **The only useful answer.** Why StackOverflowers continue upvoting "Stop doing what you're trying to do, because I know better... and PEP 8, EAFP, and stuff!" posts is beyond my fragile sanity. (*Maybe Cthulhu knows?*) – Cecil Curry May 30 '17 at 07:15
  • 3
    Because we've run into way too much code written by people who didn't think ahead, and it breaks when you pass it something that is almost, but not quite a file because they check explicitly. The whole EAFP, duck typing thing is not some bullshit purity test. It's an actual egineering decision, – drxzcl Jan 21 '18 at 19:29
  • 4
    This may be seen as better engineering and I would prefer it personally, but may not work. It is generally *not* required that file-like objects inherit from `IOBase`. For example pytest fixtures give you `_pytest.capture.EncodedFile` which does not inherit from anything. – Tomáš Gavenčiak Jan 24 '20 at 12:33
  • 2
    For those like me who don't know, `BufferedIOBase` is the "normal" base class for byte-based file objects like `BytesIO` whereas `RawIOBase` is an unbuffered variant not normally used. [See more here](https://docs.python.org/3/library/io.html#binary-i-o). – Matthew D. Scholefield Feb 11 '21 at 23:15
46

As others have said you should generally avoid such checks. One exception is when the object might legitimately be different types and you want different behaviour depending on the type. The EAFP method doesn't always work here as an object could look like more than one type of duck!

For example an initialiser could take a file, string or instance of its own class. You might then have code like:

class A(object):
    def __init__(self, f):
        if isinstance(f, A):
            # Just make a copy.
        elif isinstance(f, file):
            # initialise from the file
        else:
            # treat f as a string

Using EAFP here could cause all sorts of subtle problems as each initialisation path gets partially run before throwing an exception. Essentially this construction mimics function overloading and so isn't very Pythonic, but it can be useful if used with care.

As a side note, you can't do the file check in the same way in Python 3. You'll need something like isinstance(f, io.IOBase) instead.

Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
41

It is generally not good practice to have checks like this in your code at all unless you have special requirements.

In Python the typing is dynamic, why do you feel need to check whether the object is file like, rather than just using it as if it was a file and handling the resulting error?

Any check you can do is going to happen at runtime anyway so doing something like if not hasattr(fp, 'read') and raising some exception provides little more utility than just calling fp.read() and handling the resulting attribute error if the method does not exist.

Tendayi Mawushe
  • 25,562
  • 6
  • 51
  • 57
  • 1
    `why` what about operators like `__add__`, `__lshift__` or `__or__` in custom classes? (file object and API: http://docs.python.org/glossary.html#term-file-object ) – n611x007 Sep 08 '12 at 12:56
  • @naxa:So what exactly about those operators? – martineau Jan 20 '15 at 15:43
  • 43
    Often just trying it works but I don't buy the Pythonic maxim that if it's hard to do in Python then it's wrong. Imagine you are passed an object and there are 10 different things you might do with that object depending on its type. You're not going to try each possibility and handle the error until you finally get it right. That would be totally inefficient. You don't necessarily need to ask, what type is this, but you do need to be able to ask does this object implement interface X. – jcoffland Sep 17 '15 at 06:19
  • 43
    The fact that the python collections library provides what might be called "interface types" (e.g., sequence) speaks to the fact that this is often useful, even in python. In general, when someone asks "how to foo", "don't foo" is not a highly satisfactory answer. – AdamC Oct 23 '15 at 19:13
  • I believe during debugging is a legitimate use for such checks when a function requiring a file-like object raises an error and I am not sure whether the argument object is the problem or the function implementation. – Bernhard Bodenstorfer Mar 19 '19 at 06:27
  • 2
    AttributeError can be raised for all sorts of reasons that have nothing to do with whether the object supports the interface you need. hasattr is needed for filelikes that don't derive from IOBase – Erik Aronesty Aug 22 '19 at 14:43
  • For example pytest fixtures give you _pytest.capture.EncodedFile which does not inherit from `IOBase`. – Tomáš Gavenčiak Jan 24 '20 at 12:35
  • I have to process vars for NLP differently if they contain a string with text content than if they contain just the name of a file. If it is a file I have to open it a process it. So I do want to treat them differently depending on their type. I don't see the need of throwing an error if there shouldn't be an error. It is clearer if I manage the two type options. If there's a reason why this is not good practice, please argument/explain it... – Martin May 24 '20 at 17:13
  • 2
    `hasattr(fp, 'seek')` is better I found out, spending hours. In some systems (for example `fastai`), `path = pathlib.Path("filename")` has both `read` and `write` attributes but is not a file-like object. It's probably best to use `insintance(fp, io.IOBase)` as some comments below suggest. – mikey Oct 03 '20 at 08:03
  • It's best to use `isinstance(fp, io.BytesIO)` in fact, which will ensre typing checking (`pyright` for example) as well. – mikey Oct 03 '20 at 12:30
27

The dominant paradigm here is EAFP: easier to ask forgiveness than permission. Go ahead and use the file interface, then handle the resulting exception, or let them propagate to the caller.

drxzcl
  • 2,952
  • 1
  • 26
  • 28
  • 9
    +1: If `x` isn't file-like, then `x.read()` will raise it's own exception. Why write an extra if-statement? Just use the object. It will either work or break. – S.Lott Nov 02 '09 at 13:48
  • 3
    Don't even handle the exception. If someone passed in something that doesn't match the API you expect, it's not your problem. – habnabit Nov 02 '09 at 22:12
  • 1
    @Aaron Gallagher: I am not sure. Is your statement true even if it is hard for me to preserve an consistent state? – dmeister Nov 04 '09 at 11:19
  • 1
    To preserve a consistent state you can use "try/finally" (but no except!) or the new "with" statement. – drxzcl Nov 07 '09 at 12:25
  • This is also consistent with a "Fail fast and fail loudly" paradigm. Unless you're meticulous, explicit hasattr(...) checks can occasionally cause a function/method to return normally without performing its intended action. – Ben Burns Aug 22 '11 at 15:09
  • What if you have complex handlers like this ? with other libraries coded as "EAFP" ? This is a HUGE spaghetti-plate to debug this ! Failing with "Config file not readable" is way better than a big stacktrace of messy functions calling each other resulting in a .read fail. – moutonjr Mar 12 '21 at 12:38
  • Failing with anything other than a stack trace is hiding problems. Please don't do that. – drxzcl Mar 13 '21 at 21:46
11

It's often useful to raise an error by checking a condition, when that error normally wouldn't be raised until much later on. This is especially true for the boundary between 'user-land' and 'api' code.

You wouldn't place a metal detector at a police station on the exit door, you would place it at the entrance! If not checking a condition means an error might occur that could have been caught 100 lines earlier, or in a super-class instead of being raised in the subclass then I say there is nothing wrong with checking.

Checking for proper types also makes sense when you are accepting more than one type. It's better to raise an exception that says "I require a subclass of basestring, OR file" than just raising an exception because some variable doesn't have a 'seek' method...

This doesn't mean you go crazy and do this everywhere, for the most part I agree with the concept of exceptions raising themselves, but if you can make your API drastically clear, or avoid unnecessary code execution because a simple condition has not been met do so!

Ben DeMott
  • 3,362
  • 1
  • 25
  • 35
  • 1
    I agree, but along the lines of not going crazy with this everywhere - a lot of these concerns should shake out during testing, and some of the "where to catch this/how to display to user" questions will be answered by usability requirements. – Ben Burns Aug 22 '11 at 15:16
9

You can try and call the method then catch the exception:

try:
    fp.read()
except AttributeError:
    raise something

If you only want a read and a write method you could do this:

if not (hasattr(fp, 'read') and hasattr(fp, 'write')):
   raise something

If I were you I would go with the try/except method.

Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
  • I'd suggest switching the order of the examples. `try` is always the first choice. The `hasattr` checks are only -- for some really obscure reason -- you can't simply use `try`. – S.Lott Nov 02 '09 at 14:07
  • 1
    I suggest using `fp.read(0)` instead of `fp.read()` in order to avoid put all code in the `try` block if you want to process the data from `fp` subsequently. – Meow Dec 12 '14 at 12:19
  • 3
    Note that `fp.read()` with big files will immediately increase memory usage. – Kyrylo Perevozchikov May 19 '15 at 13:21
  • I get this is pythonic, but then we have to read the file twice among other issues. For example, in `Flask` I did this and realized the underlying `FileStorage` object needed it's pointer reset after being read. – Adam Hughes Feb 28 '20 at 15:19
3

I ended up running into your question when I was writing an open-like function that could accept a file name, file descriptor or pre-opened file-like object.

Rather than testing for a read method, as the other answers suggest, I ended up checking if the object can be opened. If it can, it's a string or descriptor, and I have a valid file-like object in hand from the result. If open raises a TypeError, then the object is already a file.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
2

Under most circumstances, the best way to handle this is not to. If a method takes a file-like object, and it turns out the object it's passed isn't, the exception that gets raised when the method tries to use the object is not any less informative than any exception you might have raised explicitly.

There's at least one case where you might want to do this kind of check, though, and that's when the object's not being immediately used by what you've passed it to, e.g. if it's being set in a class's constructor. In that case, I would think that the principle of EAFP is trumped by the principle of "fail fast." I'd check the object to make sure it implemented the methods that my class needs (and that they're methods), e.g.:

class C():
    def __init__(self, file):
        if type(getattr(file, 'read')) != type(self.__init__):
            raise AttributeError
        self.file = file
Robert Rossney
  • 94,622
  • 24
  • 146
  • 218
  • 1
    Why `getattr(file, 'read')` instead of just `file.read`? This does the exact same thing. – abarnert Jul 17 '14 at 19:39
  • 1
    More importantly, this check is wrong. It will raise when given, say, an actual `file` instance. (Methods of instances of builtin/C-extension types are of type `builtin_function_or_method`, while those of old-style classes are `instancemethod`). The fact that this is an old-style class, and that it uses `==` on types instead of `ininstance` or `issubclass`, are further problems, but if the basic idea doesn't work, that hardly matters. – abarnert Jul 17 '14 at 19:44
0

This is only for Python 3.x, but if the distinction made is between a file object and an argument to pass to the open() function, one approach is to test for the argument to open(), which is much more well-defined:

from pathlib import Path
def fxn(arg):
    if isinstance(arg, (str, bytes, Path)):
        f = open(arg)
    else:
        f = arg

The argument for duck-typing still holds, though. This code fragment might not work in future versions of Python. Some of this choice is going to come down to one's tolerance for ambiguity.

sfaleron
  • 61
  • 1
  • 3
-5

try this:

import os 

if os.path.isfile(path):
   'do something'
Aymen Alsaadi
  • 1,329
  • 1
  • 11
  • 12
  • This one does not on `io.StringIO` or `io.BytesIO`. `os.path.isfile()` works only on strings, bytes, os.PathLike or integers. – DiKorsch Apr 28 '21 at 09:52