0

I made a script to validate and clean a complex DataFrame with calls to a database. I divided the script into modules. I import them frequently between files to use certain functions and variables that are spread across the different files and directories.

Please see image to see how files are distributed across directories:

Directory structure

Directory structure

An example would look like this:

# Module to clean addresses: address_cleaner.py
def address_cleaner(dataframe):
...
return dataframe

# Pipeline to run all cleaning functions in order looks like: pipeline.py
from file1 import function1
...
def pipeline():
df = function1
...function2(df)
...function3(df)
...
return None

# Executable file where I request environment variables and run pipeline: exe.py
from pipeline import pipeline
import os
... 
pipeline()
...

When I run this on Unix:

% cd myproject
% python executable.py

This is one of the import cases, which I import to avoid hardcoding environment variable string names:

File "path/to/executable.py", line 1, in <module>
from environment_constants import SSH_USERNAME, SSH_PASSWORD, PROD_USERNAME, PROD_PASSWORD
ModuleNotFoundError: No module named 'environment_constants

I get a ModuleNotFoundError when I run executable.py on Unix, that calls the pipeline shown above and it seems as if all the imports that I did between files to use a function, variable or constant from them, especially those in different directories didn't reach to each other. These directories all belong to the same parent directory "CD-cleaner".

Is there a way to make these files read from each other even if they are in different folders of the script?

Thanks in advance.

ansanser
  • 3
  • 4

1 Answers1

0

Either create proper python modules that you can install and manage with pip or (easier) just always use your root as your working directory. Set the import paths accordingly and then just run the file from the root folder such as python generate/generate_dataframe.py when you are in the joor-cd-cleaner directory.

Semmel
  • 575
  • 2
  • 8
  • Okay, I see what you mean. My intention was to create a pipeline function that called each individual function, which in turn they each call to some file in the constants directory, and then run the pipeline function from an executable file. I use the paths mainly to locate the files I want to process and the where I want to output them. Thanks for your quick reply. I will try to implement what you mention above. – ansanser Jan 09 '21 at 22:07
  • well you can also use Pyinstaller https://datatofish.com/executable-pyinstaller/ to handle all of this for you. – Semmel Jan 09 '21 at 22:21
  • You can also either set the working directory or import the files from paths that you create using the path of the pipeline file right now. To get this path, see https://stackoverflow.com/questions/5137497/find-current-directory-and-files-directory – Semmel Jan 09 '21 at 22:23