0

I'm trying to make a C++ research code written by colleagues easily usable by new graduate students. What I'd like to do is a package that abstracts the underlying structure of the project with a clean programming interface. This should be simple enough for my colleagues to help maintain it, without being world class experts of Python.

I would need your help to design and organize my package.

Existing code

The existing code does physics simulations. It takes a plain text input file that contains values for initiating a model. Then, some number-crunching is done in C++, and it writes the simulation results to another text file.

The problem with this approach is that it lacks flexibility, especially when looping on different values of the parameters, and it's not super user-friendly to setup and use. Using bash scripts encourages bad practices and poor reproducibility.

Goal

What we would like is a package that :

  • ships with the C++ model and is able to run it from Python.
  • is able to do template-filling to create the input file and to parse the output file. [for that I don't need help]
  • (optional) build the C++ to allow extending the model. Otherwise just include pre-compiled binaries.

The user would have something like this at the end :

import mymodel

# The variables I'll use
myparams = {temperature : 100, foo : 1, bar : "hello", ...}

# create a Python object for example
mysim = mymodel.simulation(myparams) 

# run the C++ model
result = mysim.run() 

Questions

What I still can't figure out is :

  1. In my case, is this a reasonable strategy to do so, or should I consider another one, eg. Python/C++ direct interface ? Seems difficult and I only have to call the main function of the C++ code to have the model running.
  2. How to build the C++ code when the package is being installed, with support of Linux/MacOS/Windows (the C++ project has no third-party dependencies), and then how do I manage to run the binary ? Alternatively, how do I distribute pre-compiled versions for major OS families.

What I really don't understand

I read a number of doc pages related to distutils, setup.py files, searched The Hitchhiker’s Guide to Packaging, but I couldn't find a comprehensive guide to what I'm trying to do. Especially, I don't understand what my setup.py should contain, how my package should be organized and how I should handle the different files paths when calling the binary...

arna
  • 153
  • 6

2 Answers2

0

I think your approach sounds reasonable. I would tackle it as:

  1. Create a wrapper around the C++ model in Python.
  2. Compile the model and import the wrapper to make sure things work.
  3. Create a python library of the wrapper including the compiled C++ ".o".
  4. Test the wrapper for standalone use and then distribute.

There are many ways to do the above; they will all require some work, but eventually you will get a workflow.

Personally, I found Cython to be a very effective 1-stop shop to do all of the above. Although Cython was developed for high performance python, its flow for integrating both C and C++ models alone is worth using IMHO.

Please review the Cython Tutorial for a quick way to get started. The full documentation has a lot more language specific detail but is not needed for your purposes.

clocker
  • 1,376
  • 9
  • 17
0

Finally I decided to go for what I found to be the simplest approach, as a first step to initiate the project. I may consider changing it later on.

I added pre-compiled binaries to a "bin" folder, and used the package_data={'mypkg': ['bin']} option in my setup.py file to solve the path issue as explained here. The executable foo has to be situated here : mypkg/bin/foo. This way, I can run foo from within my package with something like :

subprocess.getoutput(pkg_resources.resource_filename('mypkg', 'bin')+"/foo")

I am aware that this is somewhat hacky and limited, but building/wrapping C++ code with Python is still a headache for me. If you can think of a better solution, please don't hesitate to post it.

arna
  • 153
  • 6