4

I have two python files. I am using one of them to import all the prerequisite libraries. I am using the other one to execute some code. Here is the first python file named imports.py

def importAll(process):

    import pandas as pd
    import numpy as np
    import seaborn as sns
    import matplotlib.pyplot as plt
    print('Success')
    if process == 'train':
        import sklearn

The second python file train.py is as follows:

from imports import  importAll
importAll('train')


def load_data(date):
    #load only data till Sep
    df = pd.read_csv('df.csv')
    return(df[df['date'] < date])


date = '2012-09-01'
df = load_data(date)

When I run train.py, note that 'Success' is getting printed (from the imports.py file) However, I also get the error that pd is not defined ( in the line df = pd.read_csv('df.csv') ) Is there any way to correct this error?

Hemanya Tyagi
  • 81
  • 1
  • 6
  • Imports can't come from another module like that - they won't exist in the scope of your calling module. Explicitly import them at the top of each module. – dspencer Apr 05 '20 at 04:36
  • The `import` statement adds the imported module to the local "scope" as a variable, all the modules you imported are only available inside your `importAll` function – Iain Shelvington Apr 05 '20 at 04:38

3 Answers3

6

When you import from within the scope of a function, that import is only defined from within that function, and not in the scope that the function is called in.

I'd recommend looking at this question for a good explanation for scope rules in python.

To fix this, you can use python's star import.

imports.py:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

__all__ = [
    'pd',
    'np',
    'sns',
    'plt'
]

train.py:

from imports import *

...

The syntax from module import * imports all variables defined in the __all__ list from within that module.

Edit

I strongly discourage the use of this code, because has the opposite effect you intend it to have. This will remove the "clutter" of redundant import statements, at the cost of something much worse: confusing code, and a hidden bug waiting to come to the surface (explained below).

Alas, the solution:

import inspect

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

CODE_CONTEXT = ['from imports import *\n']

__all__ = [
    'pd',
    'np',
    'sns',
    'plt'
]

def _get_filename():
    frame, *_ = filter(
        lambda frame: getattr(frame, 'code_context', None) == CODE_CONTEXT,
        inspect.stack()
    )
    return frame.filename

imported_from = _get_filename()

if imported_from == 'train.py':
    import sklearn
    __all__.append('sklearn')

elif imported_from == 'eda.py':
   ...

To understand how a bug might come from this code, consider this example:

imports.py:

import inspect

CODE_CONTEXT = ['from imports import *\n']

__all__ = []

def _get_filename():
    frame, *_ = filter(
        lambda frame: getattr(frame, 'code_context', None) == CODE_CONTEXT,
        inspect.stack()
    )
    return frame.filename

imported_from = _get_filename()
print(imported_from)

a.py:

from imports import *

b.py:

from imports import *

main.py:

import a
import b

When you run python3 main.py what will print to the console?

a.py

Why isn't b.py printed? Because modules are only executed once, during their first import. Since a.py imported the module first, each subsequent import of imports.py won't re-execute the module, they will reuse the code that was built during its initial execution.

TLDR;

Any alterations made to __all__ will only apply to the first import of the module, and subsequent imports might be missing the modules that are needed.

Lord Elrond
  • 13,430
  • 7
  • 40
  • 80
  • I have tried this and this has worked for me. However, I want to make it a bit modularized. There are n files in my project and I want to import different libraries for all those n files. For example, if the file is train.py, then I will import sklearn otherwise I won't. If the file is eda.py, I want to import matplotlib, otherwise not. I realize the scope mistake, but is there any way I can handle this dynamic import thing. – Hemanya Tyagi Apr 05 '20 at 04:53
  • @HemanyaTyagi will you need to use both `train.py` and `eda.py` in the same process? – Lord Elrond Apr 06 '20 at 01:31
  • They are not connected in any way. I run eda.py to generate and save images. I run train.py to compare ML models and save the best one. I also have another file predict.py which will predict using the saved model. All three of these files require some common imports and some specific imports. I do not want to name these imports in each file. I think it adds clutter. Another approach can be writing specific import files for each file, but if I can do it using one file, it would be great. – Hemanya Tyagi Apr 06 '20 at 08:29
1

Ideally you should working in python package like structure, Python module needs to by default have __init__.py file. There you can include all the other files/modules in package through __init__.py file.

Suppose, you make package example package. file structure will be,

  • outer.py
  • example
    • init.py
    • file1.py
    • file2.py
    • package2
      • __init.py

If file1.py one has two classes

class A:
    def __init__(self):
        pass

class B:
    def __init__(self):
         pass

If file2.py one has again two more classes

class C:
    def __init__(self):
        pass

class D:
    def __init__(self):
         pass

and you want to include all this classes in outer file

add/import all classes in __init__.py file, like

from .file1 import *
from .file2 import * 

Now in outer file you can simply do this outer.py

from package import *
# this will import all four A, B, C, D classes here
Rushikesh
  • 129
  • 1
  • 4
1

Easier solution:

|_Package
--|_imports.py
--|_train.py

imports.py

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

train.py

from imports import  *


def load_data(date):
    #load only data till Sep
    df = pd.read_csv('df.csv')
    return(df[df['date'] < date])


date = '2012-09-01'
df = load_data(date)
Bas
  • 358
  • 3
  • 7