2

Possible Duplicate:
Efficient bidirectional hash table in Python?

I'm working on an AST parser, that parses python code. First, I turn the python code into an AST with a call to compile.
In doing this, I need to ensure that I don't compile the same imported module twice, even if there are two calls to import it.

Currently, I can detect that these two calls are equivalent:

import modname as mod
import modname as mod

I maintain a dictionary that maps (in this case) mod to modname. I use this not only to detect that modname has already been imported, but also for some other bookkeeping functions down the road.

Now, I can't detect that the following three calls import the same module:

import modname as mod
import modname as foo
import modname

I know that I can easily circumvent this problem by using a set to contain modname and check this set before I compile modname the second time. However, this requires another block of linear space.
I could do a linear pass over the dictionary to check if any key maps to a value of modname, but that defeats the purpose of using a dict.

Hence my question: does there exist a data structure that's a "two-way dict" - one that maps it's keys to values and whose values can be looked up at O(1) time as well?

Community
  • 1
  • 1
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • It wouldn't take less space than two one-way dicts. – Pavel Anossov Jan 18 '13 at 23:33
  • Check out http://stackoverflow.com/questions/3318625/efficient-bidirectional-hash-table-in-python – Thomas Jan 18 '13 at 23:34
  • 1
    Don't you already have a structure that contains per-module variable names? Where do you store `mod`, in a global, or a structure per module? This is how Python does it; `sys.modules` is a mapping holding all the module namespaces (including `__main__` for the main script). – Martijn Pieters Jan 18 '13 at 23:38
  • @MartijnPieters: thanks for mentioning `sys.modules`. That would work for my purposes. I could check `imported_modules['mod'] in sys.modules`. If you post your suggestion as an answer, I would accept it – inspectorG4dget Jan 18 '13 at 23:43
  • @inspectorG4dget: I thought you were only parsing, not also importing. :-) – Martijn Pieters Jan 18 '13 at 23:46
  • @MartijnPieters: well, I have to parse the ASTs of the functionDefs and classDefs in the imported module (`modname`) as well. But if `modname` is being imported a second time in the source code, then parsing it's AST a second time would be inefficient – inspectorG4dget Jan 18 '13 at 23:48

1 Answers1

1

Python already tracks imported modules, in sys.modules.

Each key is a imported module name (and not the alias under which it was imported), each value is the actual module object with the global namespace for that module:

>>> import sys
>>> import os
>>> 'sys' in sys.modules
True
>>> 'os' in sys.modules
True
>>> sys.modules['sys'].__dict__.keys()
['setrecursionlimit', 'dont_write_bytecode', 'getrefcount', 'path_importer_cache', 'stdout', 'getprofile', '__stdin__', 'version_info', 'exc_clear', 'prefix', 'getfilesystemencoding', 'byteorder', '_clear_type_cache', 'excepthook', 'ps1', 'exc_type', '__excepthook__', 'executable', 'float_info', 'copyright', 'setdlopenflags', 'exec_prefix', 'getdlopenflags', 'getrecursionlimit', 'py3kwarning', 'path_hooks', '__package__', '_current_frames', 'platform', 'maxsize', 'version', 'exit', 'call_tracing', 'callstats', 'flags', 'setcheckinterval', '__doc__', 'api_version', '__plen', 'getdefaultencoding', 'getcheckinterval', 'maxunicode', 'settrace', 'setprofile', 'argv', '__stdout__', 'meta_path', '__name__', 'subversion', 'builtin_module_names', 'stdin', '__stderr__', '__egginsert', 'displayhook', 'ps2', 'gettrace', 'modules', 'warnoptions', 'last_type', 'getsizeof', 'last_traceback', 'maxint', '__displayhook__', '_getframe', 'stderr', 'exc_info', 'path', 'last_value', 'hexversion']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343