0

I have a .yml configuration file that controls all file I/O for my program depending on the client(s) I am running it for. The client name should be somewhere in the paths given in the YAML file, for example:

client: CLIENT_1
data:
  raw-file-path: D://Users//product//data//raw//CLIENT_1//CLIENT_1_data.csv
  processed-data-file-path: D://Users//product//data//processed//CLIENT_1//CLIENT_1_processed_data.csv

There are multiple clients and their data always lives in named subdirectories. My code ingests the raw data for each client and it generates processed data in the appropriate directories, as per the example above. In most cases I want to run the scripts for a single client, so I can edit the config.yml file I showed, but I would like to be able to do it programmatically. I added a --clients argument in the ArgumentParser: parser.add_argument('--clients', nargs='+', default=[], help='list of clients') to allow for a list of clients to be given as input, for example python run.py --config config.yml --clients CLIENT_1 CLIENT_2

I would like to find a way to manipulate all these paths to point to the appropriate directory and even name the files leveraging something like f-strings, but I don't know how to do that. the closest question I found was this one: Leverage Python f-strings with Yaml files?, but it refers to Jinja2 templates which I am not familiar with. Is there a simpler way to do this?

nvergos
  • 432
  • 3
  • 15

1 Answers1

1

There is a Python package available to smartly construct objects from YAML/JSON/dicts, and is actively being developed and expanded. (full disclosure, I am a co-author of this package, see here)

Also, there are options for passing in arguments see this

Then you can define this in your YAML:

some_field: _|arg_name|_

And load it like this:

test_conf_yaml = PickleRick('./tests/placebos/test_config.yaml', arg_name='hallo world')

Install:

pip install pickle-rick

Use:

Define a YAML or JSON string (or file).

BASIC:
 text: test
 dictionary:
   one: 1
   two: 2
 number: 2
 list:
   - one
   - two
   - four
   - name: John
     age: 20
 USERNAME:
   type: env
   load: USERNAME
 callable_lambda:
   type: lambda
   load: "lambda: print('hell world!')"
 datenow:
   type: lambda
   import:
     - "from datetime import datetime as dd"
   load: "lambda: print(dd.utcnow().strftime('%Y-%m-%d'))"
 test_function:
   type: function
   name: test_function
   args:
     x: 7
     y: null
     s: hello world
     any:
       - 1
       - hello
   import:
     - "math"
   load: >
     def test(x, y, s, any):
       print(math.e)
       iii = 111
       print(iii)
       print(x,s)
       if y:
         print(type(y))
       else:
         print(y)
       for i in any:
         print(i)

Then use it as an object.

>> from pickle_rick import PickleRick

>> config = PickleRick('./config.yaml', deep=True, load_lambda=True)

>> config.BASIC.dictionary
{'one' : 1, 'two' : 2}

>> config.BASIC.callable_lambda()
hell world!

You can define Python functions, load additional data from other files or REST APIs, environmental variables, and then write everything out to YAML or JSON again.

This works especially well when building systems that require structured configuration files, or in notebooks as interactive structures.

There is a security note to using this. Only load files that are trusted, as any code can be executed, thus stay clear of just loading anything without knowing what the complete contents are.

The package is called PickleRick and is available here:

amateurjustin
  • 146
  • 1
  • 10