0

I have a project directory set up as follows:

project/
    src/
        __init__.py
        data/
            __init__.py
            pull_data.py
            format_data.py
            update_data.py
        model/
            __init__.py
            train.py
            predict.py

The main thing I will be doing is running train.py and predict.py which both update my data then train and predict respectively. I would also like to be able to run the other python files independently as well if needed.

My question are, is this an appropriate way to set up my project directory and how am I supposed to import the files?

The python files look something like this:

# update_data.py
from pull_data import pull_data # how? this is in same directory
from format_data import format_data # how? this is in same directory

def update_data():
    pull_data() # this is in the same directory
    format_data() # this is in the same directory
    # other stuff

if __name__ == '__main__':
    update_data()

,

# train.py
from update_data import update_data # how? this is in ../data

def train():
    update_data()
    # other stuff

if __name__ == '__main__':
    train()

,

# predict.py
from update_data import update_data # how? this is in ../data

def predict():
    update_data()
    # other stuff

if __name__ == '__main__':
    predict()

I was also under the assumption that I should run the code from the src directory with something like python model/train.py. I'm open to improvements to that as well.

PL3
  • 413
  • 1
  • 5
  • 15
  • Does your current configuration work? – wwii Aug 03 '17 at 17:06
  • Possible duplicate of [Absolute imports in python not working, relative imports work](https://stackoverflow.com/questions/45448182/absolute-imports-in-python-not-working-relative-imports-work) – anthony sottile Aug 03 '17 at 17:06
  • With current config running `python model/train.py` results in `ModuleNotFoundError: No module named 'update_data'`. – PL3 Aug 03 '17 at 17:22

2 Answers2

0

I was able to get this to work by changing `update_data.py' to

from data.pull_data import pull_data
from data.format_data import format_data

and train.py, predict.py to

from data.update_data import update_data

then running from the src/ directory either of the following

python -m model.train
python -m model.predict

I don't know if this is the most appropriate way to run things, but it works.

PL3
  • 413
  • 1
  • 5
  • 15
-1

Your update_data files looks fine.

You can update train.py and predict.py as below

# train.py
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from data.update_data import update_data 

def train():
    update_data()
    # other stuff

if __name__ == '__main__':
    train()

Your predict.py should look like something

# predict.py
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from data.update_data import update_data 

def predict():
    update_data()
    # other stuff

if __name__ == '__main__':
    predict()

Instead of appending paths in both files inside model folder or probably to all files where you import something from one folder to another folder. you can use below project structure where you can add main.py in your src folder

project/
src/
    __init__.py
    data/
        __init__.py
        pull_data.py
        format_data.py
        update_data.py
    model/
        __init__.py
        train.py
        predict.py
   main.py 

where you use main.py as entry point to your script or command.

Now your files will look as below

# train.py
from data.update_data import update_data 

def train():
    update_data()
    # other stuff

if __name__ == '__main__':
    train()

predict.py

# predict.py
from data.update_data import update_data 

def predict():
    update_data()
    # other stuff

if __name__ == '__main__':
    predict()

and src/main.py

import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from model.train import train
from model.predict import predict
viveksyngh
  • 777
  • 7
  • 15
  • With your suggesting running `python model/train.py` I get `ModuleNotFoundError: No module named 'data'`. – PL3 Aug 03 '17 at 17:23
  • You cal also do it by add in these tow lines at top of ```train.py``` and ```predict.py``` ```import sys import os sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))``` ... This basically tells python where to look for module name 'data' if it is not able to find in sys paths – viveksyngh Aug 03 '17 at 18:30
  • monkeying with `sys.path` is almost always the wrong approach – anthony sottile Feb 19 '18 at 18:00