0

Up until now, I have a structure like this on the top of all of my files (I process raw data and do analysis with pandas so I am working with a lot of raw data):

raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'

There are a bunch of different paths and some .py files use the same path name to refer to a different path (for example, raw_location is the source of the data which is different for different files). It has become a mess.

Under the locations, I have a list of file names (import_filename, modified_filename, dashboard_filename). All told, I am wasting like 10+ lines of code on each file just to specify variable names. I know there must be a better way to do this.

So far I moved my .py and .ipynb files into folders within the main directory which means I can use relative paths like '../raw' which has helped. Can I create a file which has all of the paths and file name variables within it and then read that instead of listing the paths at the top of my code? What is the best practice here?

trench
  • 5,075
  • 12
  • 50
  • 80
  • 2
    You should write a configuration file, where you put all the information – Thomas Junk Apr 05 '16 at 12:39
  • 2
    Store the paths in a dictionary that's saved as a json file? It's hard to know what the intention is here; maybe a complete re-write of the scripts would solve these issues in the process. – jDo Apr 05 '16 at 12:42
  • Well, I could rewrite. However, I am asking about some best practices to follow. I notice that I do not see a bunch of explicit paths when I look at other people's code but I do not know the best way to avoid this. I went from Excel to pandas/python so I am only slowly becoming more efficient and organized. – trench Apr 05 '16 at 12:44

1 Answers1

0

Edit: After reviewing the comments below and learning this issue deeper - I've added two additional options:

1) Use python "configparser" - https://docs.python.org/2/library/configparser.html

Examples: https://stackoverflow.com/a/29479549/5088142

2) As BlackJack mentioned - one can remove the "class" from the imported file You can write config file, e.g. named: LDconfig.py

raw_location = 'C:/Users/OneDrive/raw/'
output_location = 'C:/Users/OneDrive/output/'
mtd_location = 'C:/Users/OneDrive/modified/'
py_location = 'C:/Users/OneDrive/py_files/'

in your files, you will import this class from this LDconfig.py file using:

import LDconfig

In your files you can access the data using: importedmodule.variable, e.g.

LDconfig.raw_location

3) You can write config file, e.g. named: LDconfig.py with class

class LDconfig:
    raw_location = 'C:/Users/OneDrive/raw/'
    output_location = 'C:/Users/OneDrive/output/'
    mtd_location = 'C:/Users/OneDrive/modified/'
    py_location = 'C:/Users/OneDrive/py_files/'

in your files, you will import this class from this LDconfig.py file using:

from LDconfig import LDconfig

In your files you can access the data using: classname.variable, e.g.

LDconfig.raw_location
Community
  • 1
  • 1
Yaron
  • 10,166
  • 9
  • 45
  • 65
  • cool, that worked. Is this a better choice compared to creating a text file or something? – trench Apr 05 '16 at 14:59
  • 1
    I don't see the point of the class. The module itself is already a namespace so if you leave out the class you can just do `import LDconfig` and access the the data with `LDconfig.raw_location`. Which is exactly what programs like Django or Sphinx do. – BlackJack Apr 05 '16 at 16:40
  • 1
    @LanceDacey If it's better to write a module instead of an INI or JSON file depends. A module has the advantage of having the whole Python language to define and manipulate values. A ”dumb” configuration file has the advantage of being ”safe” (can't contain malicious code) and it can be manipulated with other programs, possibly in other languages than Python. – BlackJack Apr 05 '16 at 16:43
  • I removed the class and I am going with the python module since no end user has access to this (it is just for data manipulation in pandas, there are no usernames or passwords etc). Since I import pandas as pd, numpy as np, datetime as dt, pytz , etc for each of my files (i have over a dozen files), should the imports be within the config file as well? – trench Apr 05 '16 at 16:59