3

Question

When encounter ImportError in python, should I directly raise the error and ask the user to install it, or should I use import chain?

Description

I came across this question when I tried to use lxml package to parse xml file in python.
In its official documentation, it says:

If your code only uses the ElementTree API and does not rely on any functionality that is specific to lxml.etree, you can also use (any part of) the following import chain as a fall-back to the original ElementTree:

try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    try:
        import xml.etree.cElementTree as etree
        print("running with cElementTree on Python 2.5+")
    except ImportError:
        ...

It seems to me that it's a bad idea to import a substitution since:
if you can import another library as a substitution, which may not have all the methods as lxml, then all your script can only based on those available methods in all the packages.

Then it make less sense to import the most powerful package (e.g. lxml here), we could directly import the least functional one, and save a lot codes. Or if we want to use additional methods later, we then should directly raise the ImportError.

However, as answered in Error handling when importing modules , I find this approach seems to be used frequently in python programming:

it's useful to define multi-level functionality based on what library has been imported in runtime.

But it seems to me that, the multi-level functionality can only be achieved by constantly checking whether one library has been imported, which makes the whole codes complicated and ugly.

As a result, I just wondered why people sometimes use such structure, instead of raise the error directly?

martineau
  • 119,623
  • 25
  • 170
  • 301
Cielo
  • 57
  • 1
  • 8

2 Answers2

1

To answer your last question first:

When encounter ImportError in python, should I directly raise the error and ask the user to install it, or should I use import chain?

You can handle ImportErrors for many reasons:

  • If your module directly depends on a module, let the error happen. Some libraries re-raise the error with a helpful error message if the dependency's installation is non-trivial.
  • If your module is trying to substitute slower libraries for faster ones with identical APIs, there's no reason to print anything to screen.
  • If your module expects a certain library to exist but a significantly slower one is the only one you can find, a warning may be useful to let the developer know that your module will still function but will not be as fast as it should.

Now for your other questions:

Then it make less sense to import the most powerful package (e.g. lxml here), we could directly import the least functional one, and save a lot codes.

In the specific case of lxml.etree, ElementTree, and cElementTree, all three implement the same API. They're substitutes for one another. ElementTree is pure-Python and will always work, but cElementTree is usually present and is faster. lxml.etree is even faster but is an external module.

Think of it like this:

try:
    import super_fast_widget as widget
except ImportError:
    try:
        import fast_widget as widget
    except ImportError:
        import slow_widget as widget

From your code's perspective, widget will always work the same regardless of which library actually ended up getting imported, so it's best to try to import the fastest implementation and fall back on slower ones if performance is something you care about.

You are correct in that you can't fully utilize all of lxml's features if you allow fallback libraries. This is why lxml.etree is being used instead of just lxml. It intentionally mimics the API of the other two libraries.

Here's a similar example from Django's codebase:

# Use the C (faster) implementation if possible
try:
    from yaml import CSafeLoader as SafeLoader
    from yaml import CSafeDumper as SafeDumper
except ImportError:
    from yaml import SafeLoader, SafeDumper

Python internally does this for a lot of built-in modules. There's a slower, pure Python version that's used as a fallback for the faster C version.

However, as answered in Error handling when importing modules , I find this approach seems to be used frequently in python programming:

Your lxml.etree example substituted slower libraries for faster ones. The linked example code defines a common, cross-platform interface (getpass) to a bunch of libraries that all do the same thing (prompt you for your password). The author handles the ImportErrors because those individual modules may not exist depending on your operating system.

You could replace some of the try blocks with if platform.system() == 'Windows' and similar code, but even among a single OS there may be better modules that perform an identical task so the try blocks just simplify it. In the end getpass still prompts the user for their password with the exact same API, which is all you really care about.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • That makes sense, so import chain is mainly used when consider cross-platform and efficiency reasons instead of multi-level functionality. (I understand multi-level functionality as doing different job based on different libraries imported in runtime, e.g. I want to achieve a better result if a library has been imported or a more basic goal if not) And if I want achieve multi-level functionality that I mean (e.g. I want to use a method that is specific to lxml.etree), it's better to raise the error? – Cielo Jan 28 '18 at 00:08
  • @Cielo: Can you explain what you mean by "multi-level functionality"? – Blender Jan 28 '18 at 01:18
  • by multi-level functionality, I mean "If imported successfully, make it fancier, otherwise satisfies basic requirements". E.g., if imported lxml, try correct the mis-matched tags, otherwise raise error when meeting the same problem. I'm not sure if this term is as the same meaning as in Dive into Python. – Cielo Jan 28 '18 at 03:59
  • @Cielo: At that point I think it's up to you to decide how you want your application to behave. I generally don't like optional runtime dependencies like that because I find it more reasonable to be able to fully utilize a module when I install it, but there are always exceptions. – Blender Jan 28 '18 at 05:17
0

I usually use import chains because the output is more controlled.

Raising Errors

Traceback (most recent call last):
  File "core.py", line 1, in <module>
ImportError: <error description>

Import Chains

i Importing "lxml.etree"
x Error Importing "lxml.etree"
i Importing "xml.etree.cElementTree" on Python 2.5+
x Error Importing "xml.etree.cElementTree" on Python 2.5+
i Please Install "lxml.etree" or "xml.etree.xElementTree" on Python 2.5+
i Exit with code 1
Penguin
  • 93
  • 2
  • 10