4

I apologize for not being able to phrase my question more easily. I am writing a large package that makes extensive use of pandas in almost every function. My first instinct, naturally, was to create an __init__.py as

import pandas
# then import my own submodules and other things

And then, every time I use pandas in a function, call it from the submodules as from . import pandas as pd or from .. import pandas, or something like that.

However, if I do this, when I load my package, pandas appears as a "submodule", i.e., there is a mypackage.pandas. Which doesn't hurt anyone, but I'm guessing is not correct. A way to avoid this would be adding a del pandas at the end of __init__.py, which also doesn't seem like the correct approach.

So from now on I don't import pandas in my __init__ and import it separately inside every -function-, which works fine, but is too repetitive and prevents me from setting global pandas settings.

What is the preferred approach here? Is there a method which I am missing?

Thank you.

Aya
  • 39,884
  • 6
  • 55
  • 55
TomCho
  • 3,204
  • 6
  • 32
  • 83
  • Why do you need to do things like `from . import pandas as pd` in submodules? What's wrong with just `import pandas as pd`? – Aya Jun 30 '16 at 15:19
  • @Aya Well, that's what I'm doing. But by importing pandas from the `__init__.py` call I can define some pandas' options there (like `pandas.options.display.expand_frame_repr`) and it will be valid throughout the module. Furthermore, re-importing the same package from `scratch` seems to me that takes longer, but I'm not sure it that is correct. – TomCho Jun 30 '16 at 15:23

1 Answers1

3

...by importing pandas from the __init__.py call I can define some pandas' options there (like pandas.options.display.expand_frame_repr) and it will be valid throughout the module.

They will be anyway. The module is only loaded the first time you call import pandas. At that point a reference to the module is stored in a module dictionary accessible via sys.modules. Any subsequent calls to import pandas in any other module will re-use the same reference from sys.modules, so any options you changed will also apply.

Furthermore, re-importing the same package from scratch seems to me that takes longer, but I'm not sure it that is correct.

It should actually be marginally faster, since it doesn't have to resolve relative paths. Once the module has been loaded, subsequent calls work like...

import pandas          # pandas = sys.modules['pandas']
import pandas as pd    # pd = sys.modules['pandas']
Aya
  • 39,884
  • 6
  • 55
  • 55
  • Very good. I did not know that. So, in short, what I'm already doing (which is calling pandas independently in each function) is a good way to proceed, right? – TomCho Jun 30 '16 at 15:49
  • @TomCho It's not clear if you're importing it in every *function* or every *submodule*. – Aya Jun 30 '16 at 15:57
  • I'm doing it in every *function*. I do it in every function otherwise pandas also appears as a "subsubmodule". So I'm also trying to avoid `mypackage.submodule.pandas`. – TomCho Jun 30 '16 at 16:45
  • @TomCho It might make more sense to only import it once in each module, and [use `__all__` to only expose the symbols you want it to](http://stackoverflow.com/q/44834/172176). – Aya Jun 30 '16 at 17:10
  • Thank you! I was not aware of this "tip". I'll probably start using this, then. Seems much easier and cleaner. – TomCho Jun 30 '16 at 17:55