Best practice for automatically running code generation tools (e.g. thrift) in a Python environment

Question

I am working on a Python project that uses Thrift files to define the structure of on-the-wire network messages.

The .thrift files (which define the structure of the messages) are of course checked into version control (git in my case).

The thrift compiler is used to generate code in a choice of language bindings (Python in my case) to encode and decode the on-the-wire messages from and to native Python data structures.

The command to run the compiler is:

thrift --gen py encoding.thrift

The compiler generates a new directory (gen-py) which contains the generated Python files:

$ find gen-py 
gen-py
gen-py/__init__.py
gen-py/encoding
gen-py/encoding/constants.py
gen-py/encoding/__init__.py
gen-py/encoding/ttypes.py

While there are pros and cons to checking generated files into version control (see for example here and here), I am on the side of the fence that I prefer to not check generated files into version control.

I am relatively new to Python development. I come from a background of using mostly compiled languages (e.g. C++) that use some sort of build tool (e.g. make files) where it is relatively straight-forward to add some rules to the build script to run the thrift compiler and generate the language bindings.

My questions are:

What is the best practice in Python to automatically run the thrift compiler and generate the Python files?

If possible, I would like to have dependency awareness in what-ever build tool you suggest, so that the thrift compiler is only run if necessary (i.e. the generated files are absent, or the .thrift files have been touched since the last build).

Laurent LAPORTE · Answer 1 · 2018-07-03T20:21:15.607

You can use WatchDog to monitor your file changes and run you build command. See the simple example. Implement a subclass of FileSystemEventHandler to watch file changes.

You can use subprocess.run to run you command, for instance:

subprocess.run(["thrift", "--gen", "py", "encoding.thrift"])

EDIT: add a sample implementation

Here is the idea:

import os
import subprocess
import time

from watchdog.events import EVENT_TYPE_CREATED
from watchdog.events import EVENT_TYPE_MODIFIED
from watchdog.events import FileSystemEvent
from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer


class ThriftHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        # type: (FileSystemEvent) -> None
        if event.event_type in [EVENT_TYPE_CREATED, EVENT_TYPE_MODIFIED]:
            src_path = event.src_path
            if os.path.isfile(src_path) and os.path.splitext(src_path)[1] == ".thrift":
                self.compile_thrift(src_path)
        super(ThriftHandler, self).on_any_event(event)

    def compile_thrift(self, src_path):
        print("Compiling {0}...".format(src_path))
        old_dir = os.curdir
        try:
            os.chdir(os.path.dirname(src_path))
            subprocess.run(["thrift", "--gen", "py", src_path])
        finally:
            os.chdir(old_dir)


def watch_thrift(source_dir):
    event_handler = ThriftHandler()
    observer = Observer()
    observer.schedule(event_handler, source_dir, recursive=True)
    observer.start()
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()


if __name__ == "__main__":
    watch_thrift("dir/to/watch")

Best practice for automatically running code generation tools (e.g. thrift) in a Python environment

1 Answers1