9

I have a codebase where I'm cleaning up some messy decisions by the previous developer. Frequently, he has done something like:

from scipy import *
from numpy import *

...This, of course, pollutes the name space and makes it difficult to tell where an attribute in the module is originally from.

Is there any way to have Python analyze and fix this for me? Has anyone made a utility for this? If not, how might a utility like this be made?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Kelketek
  • 2,406
  • 5
  • 21
  • 28
  • 1
    I feel for you. Hope you find some nice tool. (+1) – NPE Mar 07 '13 at 18:47
  • Even better, I hope you write a nice tool (whether based on my answer or not) and publish it on PyPI so if I ever need such a thing, I don't have to do it myself. :) – abarnert Mar 07 '13 at 19:26
  • Also see this question: [Is there an IDE/utility to refactor Python * imports to use standard module.member syntax?](http://stackoverflow.com/questions/12677061/is-there-an-ide-utility-to-refactor-python-imports-to-use-standard-module-memb) – Lukas Graf Feb 09 '14 at 17:31

4 Answers4

3

Yes. Remove the imports and run a linter on the module.

I recommend using flake8, although it may also create a lot of noise about style errors.

Merely removing the imports and trying to run the code is probably not going to be enough, as many name errors won't be raised until you run just the right line of code with just the right input. A linter will instead analyze the code by parsing and will detect potential NameErrors without having to run the code.

This all presumes that there are no reliable unit tests, or that the tests do not provide enough coverage.

In this case, where there are multiple from module import * lines, it gets a little more painful in that you need to figure out for each and every missing name what module supplied that name. That will require manual work, but you can simply import the module in a python interpreter and test if the missing name is defined on that module:

>>> import scipy, numpy
>>> 'loadtxt' in dir(numpy)
True

You do need to take into account that in this specific case, that there is overlap between the numpy and scipy modules; for any name defined in both modules, the module imported last wins.

Note that leaving any from module import * line in place means the linter will not be able to detect what names might raise NameErrors!

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • If I remove the imports, how will this tell me which module the names are from? Wouldn't it just nameerror on all the attributes that are now unlisted? – Kelketek Mar 07 '13 at 18:50
  • @Kelketek: Yes, and you'll have to figure out for each one what module it is *supposed* to come from. That is not that hard, luckly. – Martijn Pieters Mar 07 '13 at 18:51
  • If you remove them one at a time, this is presumably easier. – Wooble Mar 07 '13 at 18:53
  • Presumably, I should run the linter first to get any general problems it might find, resolve them, remove one import line, run it again to see which names pop up as name errors, and then add them to a 'from' import, and then repeat the process with each wildcard import? – Kelketek Mar 07 '13 at 18:54
  • @MartijnPieters: This can actually be surprisingly painful. To take OP's example of `from scipy import *` and `from numpy import *`, the semantics of a few functions, such as `log10` would depend on which import came last (they exist in both modules but don't behave the same). See http://stackoverflow.com/questions/6200910/relationship-between-scipy-and-numpy – NPE Mar 07 '13 at 18:54
  • @Kelketek: There is a problem with that: leaving any of the `from module import *` lines in place means the linter won't be able to tell what names raise name errors. A static linter cannot see what module provides what name. – Martijn Pieters Mar 07 '13 at 18:56
  • Hmmm... Where's the mandatory +3? Smart ... don't want to run your hack too often. Anyway, +1 from me. – mgilson Mar 07 '13 at 18:58
  • I'm giving you an upvote from this answer, as it's probably the cleanest way to do this. However, I feel that abamert's method (in general) is what I'm going to need-- the previous developer had a very large number of import statements like this in a single module. Sometimes upwards of five. – Kelketek Mar 07 '13 at 19:04
  • @Kelketek: Honestly, even if I polished up my tool to the point where it was reliable (or you did so yourself), you'd probably still need to handle some cases manually. (At the very least, while debugging the tool!) So, this is an important answer either way. – abarnert Mar 07 '13 at 19:06
  • Yes, I do realize that. However, this will make a great deal of the problem easier. I'm dealing with dozens and dozens of modules like this. It is a nightmare. – Kelketek Mar 07 '13 at 19:09
  • @Kelketek: Why alter them all at once? Why not approach this on a module-by-module basis? Are they all broken? If not, don't fix until you have to! – Martijn Pieters Mar 07 '13 at 19:11
  • I'm still deciding on my exact practical approach. But whatever the case, I've got far too many of them to not consider automated assistance in tracking down the names. – Kelketek Mar 07 '13 at 19:14
3

I think PurityLake's and Martijn Pieters's assisted-manual solutions are probably the best way to go. But it's not impossible to do this programmatically.

First, you need to get a list of all names that existing in the module's dictionary that might be used in the code. I'm assuming your code isn't directly calling any dunder functions, etc.

Then, you need to iterate through them, using inspect.getmodule() to find out which module each object was originally defined in. And I'm assuming that you're not using anything that's been doubly from foo import *-ed. Make a list of all of the names that were defined in the numpy and scipy modules.

Now you can take that output and just replace each foo with numpy.foo.

So, putting it together, something like this:

for modname in sys.argv[1:]:
    with open(modname + '.py') as srcfile:
        src = srcfile.read()
    src = src.replace('from numpy import *', 'import numpy')
    src = src.replace('from scipy import *', 'import scipy')
    mod = __import__(modname)
    for name in dir(mod):
        original_mod = inspect.getmodule(getattr(mod, name))
        if original_mod.__name__ == 'numpy':
            src = src.replace(name, 'numpy.'+name)
        elif original_mod.__name__ == 'scipy':
            src = src.replace(name, 'scipy.'+name)
    with open(modname + '.tmp') as dstfile:
        dstfile.write(src)
    os.rename(modname + '.py', modname + '.bak')
    os.rename(modname + '.tmp', modname + '.py')

If either of the assumptions is wrong, it's not hard to change the code. Also, you might want to use tempfile.NamedTemporaryFile and other improvements to make sure you don't accidentally overwrite things with temporary files. (I just didn't want to deal with the headache of writing something cross-platform; if you're not running on Windows, it's easy.) And add in some error handling, obviously, and probably some reporting.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • I don't think that will quite work. What about variables named `my_foo`. All of a sudden you get `my_numpy.foo`. Oops. Of course, given a proper parser (I'm thinking `ast`), you could probably do that. – mgilson Mar 07 '13 at 19:00
  • This is probably more what I'm looking for. The guy occasionally had modules where five different wildcard imports were made. It's insanity. The search and replace here is a bit dangerous, as it makes some assumptions, but this idea puts me in the right direction. – Kelketek Mar 07 '13 at 19:00
  • Ultimately though, this is a reasonable start to a tool which could be useful for the entire community. I also wonder how this sort of thing would interact with `__all__`. To get it actually right, you'd probably want to filter out the stuff that isn't in `module.__all__` if it exists. – mgilson Mar 07 '13 at 19:02
  • @mgilson: You're right, and if you have _both_ `my_foo` and `foo`, there's no way to fix one without breaking the other. You don't even really need a proper parser here, just a lexer, which is a lot simpler. But you're right, we already _have_ a proper parser in `ast`, so maybe that's the way to go. – abarnert Mar 07 '13 at 19:02
  • 1
    @mgilson: My original thought was that I wouldn't do it this way—I'd rely on a linter and unit tests and manual editing, as in Martijn Pieters's answer. But now that you mention it, if someone were to take this and polish it up, it could be a useful general-purpose tool. – abarnert Mar 07 '13 at 19:03
  • @mgilson, even given a proper parser you can't necessarily do that. If you import name `foo`, you could still have a local variable named `foo` in one of the functions, which you can't replace with `bar.foo` – shx2 Mar 07 '13 at 19:05
  • @mgilson: Meanwhile, `__all__` isn't actually an issue here. I'm not replacing all things that `numpy` exports, but rather all things that are imported from `numpy`. The other way around may be easier (`numpy.__all__` if it exists, `dir(numpy)` otherwise), but it won't work in cases where there are multiple `from foo import *` lines with conflicting names. How do we know whether an instance of, say, `log10` is `numpy.log10` or `scipy.log10` without looking at the actual object? – abarnert Mar 07 '13 at 19:05
  • @shx2 -- A proper parser would let you know that `foo` inside the function is *local*. That's one reason why you *need* a proper parser. – mgilson Mar 07 '13 at 19:06
  • @abarnert -- whichever one is called second :) – mgilson Mar 07 '13 at 19:07
  • @abarnert: you can capture a dict of names to id (`{k:id(v) for k,v in globals().items()}`), before the first import, and again after the first `from foo import *`, and check which items in your dict changed. Then again after the second (`from bar import *`), and check again. – shx2 Mar 07 '13 at 19:14
0

I've now made a small utility for doing this which I call 'dedazzler'. It will find lines that are 'from module import *', and then expand the 'dir' of the target modules, replacing the lines.

After running it, you still need to run a linter. Here's the particularly interesting part of the code:

import re

star_match = re.compile('from\s(?P<module>[\.\w]+)\simport\s[*]')
now = str(time.time())
error = lambda x: sys.stderr.write(x + '\n')

def replace_imports(lines):
    """
    Iterates through lines in a Python file, looks for 'from module import *'
    statements, and attempts to fix them.
    """
    for line_num, line in enumerate(lines):
        match = star_match.search(line)
        if match:
            newline = import_generator(match.groupdict()['module'])
            if newline:
                lines[line_num] = newline
    return lines

def import_generator(modulename):
    try:
        prop_depth = modulename.split('.')[1:]
        namespace = __import__(modulename)
        for prop in prop_depth:
            namespace = getattr(namespace, prop)
    except ImportError:
        error("Couldn't import module '%s'!" % modulename)
        return
    directory = [ name for name in dir(namespace) if not name.startswith('_') ]
    return "from %s import %s\n"% (modulename, ', '.join(directory))

I'm maintaining this in a more useful stand-alone utility form here:

https://github.com/USGM/dedazzler/

Kelketek
  • 2,406
  • 5
  • 21
  • 28
-1

ok, this is what i think you could do, break the program. remove the imports and notice the errors that are made. Then import only the modules that you want, this may take a while but this is the only way I know of doing this, I will be happily surprised if someone does know of a tool to help

EDIT: ah yes, a linter, I hadn't thought of that.

PurityLake
  • 1,110
  • 10
  • 18