-1

I want a function to quickly check if a module is present before running other lines in the function. It may execute in several other programs in a large code base and I don't want to import sys at the top wherever it is run.

This answer explains the procedure to check if a module is present.

>>> import sys
>>> 'unicodedata' in sys.modules
False
>>> import unicodedata
>>> 'unicodedata' in sys.modules
True

Here a number of people voice opinions on when it is okay to import inside a function.

Is the following specific usage okay?

def some_function(foo):
    import sys
    if 'pandas' in sys.modules:
        if isinstance(foo, pd.DataFrame):
            # function continues
    else:
        print("pandas has not been imported in the code you are testing")

The use case is checking first if a pandas data frame fulfills various conditions and if so do other operations. Thing is, looking at the code base I can't always be sure if the thing I am testing the function on is a dataframe at all, so have been doing if is instance(variable, pd.DataFrame). But what if the function is imported somewhere and run where there is no pandas at all? I'd rather it just realised that, than crashed the whole program or imported pandas unnecessarily.

cardamom
  • 6,873
  • 11
  • 48
  • 102
  • 2
    If you need a specific to be imported for your code to run, *just import the module*. Why do you need to check for the module? We can't begin to tell you if it is okay if you don't share your use case. – Martijn Pieters Jan 23 '18 at 13:52
  • 1
    Otherwise, `sys` is a built-in module, present from the moment the Python interpreter starts (it is rather crucial for Python to operate), so importing `sys` has no effect other than just setting the name `sys` to reference the module object. There is practically zero cost do importing it. – Martijn Pieters Jan 23 '18 at 13:53
  • @MartijnPieters have added a use case – cardamom Jan 23 '18 at 14:00
  • The built-in exception `ImportError` might me useful for this check. https://docs.python.org/2/library/exceptions.html#exceptions.ImportError – Ilayaraja Jan 23 '18 at 14:00
  • 1
    The code that uses `pd.DataFrame` does the importing, no the code that imports the function. *This is not something the module that imports a function ever needs to worry about*. If you are concerned about the dependency being *installed*, then you can't test for that with the `sys.modules` test, as that only lists modules that are already imported. The `import pandas as pd` statement will throw an `ImportError` if the module is not installed, just handle that exception. – Martijn Pieters Jan 23 '18 at 14:02
  • IMO `some_function` is doing it wrong(tm). It should **just use `foo`** and assume it's a `pd.DataFrame`, and the caller should be responsible for passing the appropriate kind of object to the function. – mkrieger1 Jan 23 '18 at 14:47

1 Answers1

3

You appear to be confused between a library being installed versus a library having been imported.

Your use-case seems to be concerned with Pandas not being installed. You can't test for this possibility with sys.modules. Just have your code import Pandas, and handle the ImportError thrown if it is not available:

try:
    import pandas as pd

    def is_dataframe(obj):
        return isinstance(obj, pd.DataFrame)
except ImportError:
    def is_dataframe(obj):
        return False

The above code codifies the test for the dataframe type that will continue to work if Pandas is not installed.

If your code needs to take into account the possibility that some third-party library returns a dataframe, just use the above code to test for that contingency (but only if you can't make your code work in some other way, like just catching the exception if something is not the type you expected to handle). Don't try to second-guess if Pandas is actually being used somewhere. Either you handle dataframes or you don't, there is no need to make this dynamic. It's not as if the isinstance(obj, pd.DataFrame) test will throw an exception if obj is not a dataframe, there is no risk here.

Note that if you do try to test for the module being imported, detect that it wasn't, and only then another module imports Pandas, you made the wrong call and your code breaks. Python is a dynamic language and imports can be done at any time during the runtime of the program.

Otherwise, if Pandas is installed, and some third-party module imports Pandas to do their work and you are worried that they might, there is no need for your code to worry about this. There is no need for you to see if a third-party module is using Pandas or not, it won't make a difference to your code. Pandas is then just an implementation detail of another module.

Lastly, if a third-party module imports Pandas, your own module won't also see it. You need to use import statements for all dependencies for that module, it doesn't matter what another module has imported here, as each module is a separate namespace. You can't just use pd.DataFrame without an import statement (or other means of binding the name pd to the module object), regardless of other modules having imported it.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Pandas is definitely installed, sorry I don't always manage to be clearer here :( Its just that some out of about thirty .py files import it and others don't. I wanted my function to check the environment where it is run. I am tempted to just use the code which imports sys inside a function until someone possibly complains about it before it is merged. – cardamom Jan 23 '18 at 14:20
  • 1
    @cardamom: but **why**? What happens when you don't test for Pandas having been imported? – Martijn Pieters Jan 23 '18 at 14:27
  • `if is instance(variable, pd.DataFrame)` will not only just not execute but will crash the program! There will not even be the chance to have an else clause. – cardamom Jan 23 '18 at 14:32
  • 1
    @cardamom: also, why ask for our advice, and then ignore the advice? If you don't understand my advice or you think I am not understanding your situation, then update your question to make it clearer as to what problem you are trying to solve. Some code that illustrates the problem is probably helpful here. – Martijn Pieters Jan 23 '18 at 14:32
  • @cardamom: why would that crash the program? – Martijn Pieters Jan 23 '18 at 14:33
  • @cardamom: `import pandas as pd` can possibly raise an exception, I show in my answer how to handle that exception. `isinstance(variable, pd.DataFrame)` **can't raise an exception if `pd` was imported successfully**. There is no reason to think otherwise. – Martijn Pieters Jan 23 '18 at 14:34
  • I fixed the code in my question to reflect what is being done. But, I think you might have actually answered the question here, just noticed an edit with `ImportError` which Ive never seen and which looks like it will work. Will just test it.. – cardamom Jan 23 '18 at 14:41
  • Thanks, and sorry for the general unclarity. Have learned something I had never seen before about error handling. Adapted your answer by changing `ImportError` to `NameError` and removed the `import pandas` statement, so it now works with neither importing pandas nor sys in the function. And catches all cases including when there is no pandas. – cardamom Jan 23 '18 at 14:51
  • (error before fixing this was `NameError: name 'pd' is not defined` ) – cardamom Jan 23 '18 at 14:52
  • @cardamom: you *do need to import pandas*. Imports are per-module. You get a name error because you did not import pandas. – Martijn Pieters Jan 23 '18 at 14:52
  • 1
    @cardamom: just to be crystal clear: it doesn't matter if other modules have imported pandas. You need to use an `import` statement *in the module that wants to use Pandas*. You will **always** get a name error unless you import pandas yourself, the name error will not go away when another module imports Pandas. – Martijn Pieters Jan 23 '18 at 14:57
  • @cardamom: this also illustrates why you should always show your code and the full error message you get with that code. You misunderstood how importing works, and why the `NameError` exception is thrown. Had you shown your code and the exception we might have deduced earlier on where your misunderstanding lay. You made the wrong assumption that `pd.DataFrame` would not throw a `NameError` if a third-party module had imported Pandas. – Martijn Pieters Jan 23 '18 at 15:02
  • ok i will post the whole thing next time and test this thing is doing what it is supposed to before committing it. – cardamom Jan 23 '18 at 15:03