12

I'm trying to keep a data science project well-organized so I've created a directory inside my src directory called utils that contains a file called helpers.py, which contains some helper functions that will be used in many scripts. What is the best practice for how I should import func_name from src/utils/helpers.py into a file in a totally different directory, such as src/processing/clean_data.py?

I see answers to this question, and I've implemented a solution that works, but this feels ugly:

 sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))))

Am I doing this right? Do I need to add this to every script that wants to import func_name, like train_model.py?

My current project folder structure:

myproject
    /notebooks
        notebook.ipynb
    /src
        /processing
            clean_data.py
        /utils
            helpers.py
        /models
            train_model.py
        __init__.py

Example files:

# clean_data.py

import os
import sys

sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))))
from src.utils.helpers import func_name

func_name()


# helpers.py

def func_name():
    print('I'm a helper function.')
NoobEditor
  • 15,563
  • 19
  • 81
  • 112
blahblahblah
  • 2,299
  • 8
  • 45
  • 60
  • You just need to add `__init__.py` files at each level you import from/to. This is a standard for python's Distribution Packages. – KeatsKelleher Feb 13 '18 at 04:32
  • You should have `__init__.py` in each of those folders. And your imports should all be based off of where you are executing your project from in order to avoid all that path manipulation. – idjaw Feb 13 '18 at 04:32
  • @KeatsKelleher After adding `__init__.py` in the `/utils` and `/processing` and removing the `sys.path.append(...)`, I get a `ModuleNotFoundError: No module named 'src'`. Can you provide a more detailed answer? – blahblahblah Feb 13 '18 at 05:07
  • Stop doing that sys.path stuff. Read the answer provided and read about how Python packages and modules work. How the import system works and what the `__init__.py` means – idjaw Feb 13 '18 at 14:08

7 Answers7

9

The correct way to do it is to use __init__.py, setup.py and the setuptools Python package:

myPackage/
    myPackage/
        __init__.py
    setup.py

This link has all the steps.

Vikram Hosakote
  • 3,528
  • 12
  • 23
8

First of all, let me describe you the differences between a Python module & a Python package so that both of us are on the same page. ✌


A module is a single .py file (or files) that are imported under one import and used. ✔

import aModuleName # Here 'aModuleName' is just a regular .py file.

Whereas, a package is a collection of modules in directories that give a package hierarchy. A package contains a distinct __init__.py file. ✔

from aPackageName import aModuleName # Here 'aPackageName` is a folder with a `__init__.py` file # and 'aModuleName', which is just a regular .py file.


Therefore, when we have a project directory named proj-dir of the following structure ⤵

proj-dir --|--__init__.py --package1 --|--__init__.py --|--module1.py --package2 --|--__init__.py --|--module2.py

Notice that I've also added an empty __init__.py into the proj-dir itself which makes it a package too.

Now, if you want to import any python object from module2 of package2 into module1 of package1, then the import statement in the file module1.py would be

from package2.module2 import object2 # if you were to import the entire module2 then, from package2 import module2

I hope this simple explanation clarifies your doubts on Python imports' mechanism and solves the problem. If not then do comment here.

Sushant Rajbanshi
  • 1,985
  • 1
  • 19
  • 20
  • Thanks for the detailed response. I've added the `__init__.py` files in the directories as instructed, however where I run `projdir $ python package1/module1.py`, it returns `ModuleNotFoundError: No module named projdir`. – blahblahblah Feb 23 '18 at 23:15
  • Could you post your **new** project directory structure here so that I could figure out how exactly you're trying to access the module ? – Sushant Rajbanshi Feb 23 '18 at 23:17
  • I started a new project separate from my existing data science project with the exact directories/files as explained in your answer (I only renamed proj-dir to projdir). Could you create a Github repo with a super basic working example? – blahblahblah Feb 23 '18 at 23:55
  • @blahblahblah, Would you mind sharing your faulty project directory with me via GitHub instead ? I would be able to find out the exact cause for the issue in doing so, instead of me making a project that would work on my system and not on yours. – Sushant Rajbanshi Feb 24 '18 at 01:11
  • @blahblahblah Also, check the new project directory structure I've edited and compare it with your's and don't forget to remove all the changes related to **`sys.path`** from all of your modules. – Sushant Rajbanshi Feb 24 '18 at 01:17
  • I added you as a collaborator to a Github repo. – blahblahblah Feb 24 '18 at 23:02
  • 1
    @blahblahblah Can you guys share the solution to this? – Homero Esmeraldo Mar 20 '19 at 14:07
2

First of all let me clarify you that importing an entire module, if you are going to use a part of it, then is not a good idea. Instead of that you can use from to import specific function under a library/package. By doing this, you make your program efficient in terms of memory and performance.

To know more refer these:

  1. 'import module' or 'from module import'
  2. difference between import and from

Net let us look into the solution.

Before starting off with the solution, let me clarify you the use of __init__.py file. It just tells the python interpreter that the *.py files present there are importable which means they are modules and are/maybe a part of a package.

So, If you have N no of sub directories you have to put __init__.py file in all those sub directories such that they can also be imported. Inside __init__.py file you can also add some additional information like which path should be included, default functions,variables,scope,..etc. To know about these just google about __init__.py file or take some python library and go through the same __init__.py file to know about it. (Here lies the solution)

More Info:

  1. modules
  2. Be pythonic

So as stated by @Sushant Chaudhary your project structure should be like

proj-dir
 --|--__init__.py
   --package1
     --|--__init__.py
     --|--module1.py
   --package2
     --|--__init__.py
     --|--module2.py

So now, If I put __init__.py file under my directory like above, Will it be importable and work fine?

yes and no.

Yes :

If you are importing the modules within that project/package directory. for example in your case
you are importing package1.module1 in pakage2.module2 as from package1 import module1.

Here you have to import the base dir inside the sub modules, Why? the project will run fine if you are running the module from the same place. i.e: inside package2 as python module2.py, But will throw ModuleNotFoundError If you run the module from some other directory. i.e: any other path except under package2 for example under proj-dir as python package2/module2.py. This is what happening in your case. You are running the module from project-dir.

So How to fix this?

1- You have to append basedir path to system path in module2.py as

from sys import path
dir_path = "/absolute/path/to/proj-dir"
sys.path.insert(0, dir_path)

So that module2 will be able to find package1 (and module1 inside it).

2- You have to add all the sub module paths in __init__.py file under proj-dir.

For example:

#__init__.py under lxml
# this is a package

def get_include():
    """
    Returns a list of header include paths (for lxml itself, libxml2
    and libxslt) needed to compile C code against lxml if it was built
    with statically linked libraries.
    """
    import os
    lxml_path = __path__[0]
    include_path = os.path.join(lxml_path, 'includes')
    includes = [include_path, lxml_path]

    for name in os.listdir(include_path):
        path = os.path.join(include_path, name)
        if os.path.isdir(path):
            includes.append(path)

    return includes

This is the __init__.py file of lxml (a python library for parsing html,xml data). You can refer any __init__.py file under any python libraries having sub modules.ex (os,sys). Here I've mentioned lxml because I thought it will be easy for you to understand. You can even check __init__.py file under other libraries/packages. Each will have it's own way of defining the path for submodules.

No

If you are trying to import modules outside the directory. Then you have to export the module path such that other modules can find them into environment variables. This can be done directly by appending absolute path of the base dir to PYTHONPATH or to PATH.

To know more:

  1. PATH variables in OS
  2. PYTHONPATH variable

So to solve your problem, include the paths to all the sub modules in __init__.py file under proj-dir and add the /absolute/path/to/proj-dir either to PYTHONPATH or PATH.

Hope the answer explains you about usage of __init__.py and solves your problem.

Homero Esmeraldo
  • 1,864
  • 2
  • 18
  • 34
Mani
  • 5,401
  • 1
  • 30
  • 51
  • Isn't this missing the context of how `get_include()` is called? – Homero Esmeraldo Mar 13 '19 at 21:08
  • "You have to append basedir path to system path in module1.py", shouldn't this be module2.py, so that it will be able to find module1? And if it is, this is exactly what the OP was trying to avoid, no? – Homero Esmeraldo Mar 13 '19 at 22:52
  • @HomeroBarrocasSEsmeraldo I've mentioned that this is `__init__.py` file of `lxml`. So whenever `lxml` is imported automatically `__init__.py` is called. You can assume it like a constructor for the package `lxml`. Please refer the `lxml` docs for more details. – Mani Mar 14 '19 at 04:30
  • I am confused of where that function is called, as that code is only its declaration. I have checked the code at https://github.com/lxml/lxml/blob/master/src/lxml/__init__.py it is exactly what you wrote. But do you know where the function is called? – Homero Esmeraldo Mar 14 '19 at 15:14
  • The function is called nowhere but it's added there such that if needed the submodules can directly call that to include the project path. For more understanding [refer this.](https://stackoverflow.com/a/50307979/6663095) – Mani Mar 14 '19 at 15:29
  • I see. But isn't that exactly what the OP was trying to avoid? If I understood correctly, it would be necessary to do `sys.path.append` with the paths listed in the return of the function `get_include` – Homero Esmeraldo Mar 14 '19 at 18:54
  • I read your link, btw. But I think that is what the OP was trying to avoid. – Homero Esmeraldo Mar 18 '19 at 22:00
1

On Linux, you can just add the path to the parent folder of your src directory to ~/.local/lib/python3.6/site-packages/my_modules.pth. See Using .pth files. You can then import modules in src from anywhere on your system.

NB1: Replace python3.6 by any version of Python you want to use.

NB2: If you use Python2.7 (don't know for other versions), you will need to create __init__.py (empty) files in src/ and src/utils.

NB3: Any name.pth file is ok for my_modules.pth.

Rémi.B
  • 183
  • 10
0

Yes, you can only import code from installed packages or from files in you working directory or subdirectories.

1cedsoda
  • 623
  • 4
  • 16
0

the way I see it, your problem would be solved if you would have your module or package installed, like an yother package one installs and then imports (numpy, xml, json etc.)

I also have a package I constantly use in all my projects, ulitilies, and I know it's a pain with the importing.

here is a description on how to How to package a python application to make it pip-installable: https://marthall.github.io/blog/how-to-package-a-python-app/

Petronella
  • 2,327
  • 1
  • 15
  • 24
0
  • Navigate to your python installation folder

  • Navigate to lib

  • Navigate to site-packages

  • Make a new file called any_thing_you_want.pth

  • Type .../src/utils/helpers.py inside that file with your favorite text editor

Note: the ellipsis before scr/utils/helpers.py will look something like: C:/Users/blahblahblah/python_folders/scr... <- YOU DO NEED THIS!

This is a cheap way out but it keeps code clean, and is the least complicated. The downside is, for every folder your modules are in, example.pth will need them. Upside: works with Windows all the way up to Windows 10