Combine python and c++, or cython, to optimize a function; maximum likelihood example; little knowledge of c++

Question

I know Python, but I don't know C++. I'm trying to maximize a function that takes a long time to evaluate. I believe a good workflow would be to write the function that evaluates the function in C++ and use this function with scipy.optim.minimize to find the optimum. As an example, suppose I am maximizing a likelihood.

import pandas as pd
import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm

# simulating data
means = np.array([10, 20, 30])
cov = np.diag([1, 4, 10])

N = 1000

df = pd.DataFrame(np.random.multivariate_normal(mean=means, cov=cov, size=N),
    columns=['a', 'b', 'c'])
df[np.random.choice([True, False], size=(N, 3), p=[0.3, 0.7])] = np.nan

# a function to print parameters used in likelihood function
def print_params(params):
    print('Means: {}'.format(params[:3]))
    print('Variances: {}'.format(np.exp(params[3:])**2))

# defining likelihood
def llf(params):
    logll = 0
    for i in df.index:
        for j,col in enumerate(['a', 'b', 'c']):
            if not np.isnan(df.loc[i, col]):
                m = params[j]
                sd = np.exp(params[j+3])
                logll += np.log(norm.pdf(df.loc[i, col], loc=m, scale=sd))

    print_params(params)
    return -logll


opt = minimize(llf, x0=np.array([0, 0, 0, 1, 1, 1]), options={'maxiter':30})
print_params(opt.x)

There may be more efficient ways to write the llf function in pure Python, and there are definitely ways to speed up the optimization routine (e.g., by choosing a specific optimizer suited to the problem, or by supply derivatives), but this is not the focus of this question. I chose this particular example because I have a loop (I'm using all of the data, including rows where some columns have missing values) to evaluate the likelihood, which takes a lot of time in pure python, especially if my sample size grows.

How can I write the likelihood function in C++ and combine that with the Python minimize routine? Keep in mind, I have no experience with C++ but am willing to learn. However, a lot of the resources available for this seem to presume C++ knowledge, see Extending Python for example. I'm looking for resources for someone who knows Python, but is completely ignorant of C++ and methods for combining Python with C++. EDIT: Perhaps an example of how to do this using my example or information about the likely gains from combining Python and C++ would be useful.

If you're looking for resources, your question's off-topic because "questions asking us to _recommend or find_ a book, tool, software library, _tutorial or other off-site resource_ are off-topic for Stack Overflow as they tend to attract opinionated answers and spam." — ForceBru, Apr 25 '18 at 16:33
C++ is a very complex language and I have my doubts that writing the loop in C++ would really speed it up that much. Also you might find Cython to be more approachable if you don't know C++ — UnholySheep, Apr 25 '18 at 16:33
If the question is about the challenge of bridging C++ and Python, then that makes sense. If the goal is for faster performance, I'd do a lot of profiling before pulling the trigger -- I suspect that for many things Python is already very performant, and there are Python packages that can do some powerful (and performant) number crunching. My bet would be that the C++ / Python solution will be far less performant -- not because C++ is slow, but the bridging is overhead, and performant C++ takes a lot of strong C++ knowledge. — Eljay, Apr 25 '18 at 16:42
It's surprising for me to hear that writing a function like `llf` in C++ wouldn't be much faster, given answers like [these](https://stackoverflow.com/questions/3033329/why-are-python-programs-often-slower-than-the-equivalent-program-written-in-c-or?lq=1). I thought the paradigm was to write most of your code in python but write performance critical pieces in a lower level language. Could you explain a little bit more why using Python with C++ would not produce large gains in speed? — jtorca, Apr 25 '18 at 17:53
Like `@Eljay` said, the performance bottleneck could possibly be in the bridging overhead (going from Python to C and back). Imagine having a fast car that is stuck in traffic... Whether or not you can get a speed gain would depend on how many C calls you make, and on the duration of each call. Also, it would be wise to first profile your Python code to see which part is really limiting performance. As an example, I once noticed that looping over the elements of a pandas dataframe is much slower than first converting the dataframe into numpy, and then performing the loop. — MPA, Apr 25 '18 at 18:02
I agree with the above. Convert the data first into a suitable numpy data structure with only the needed fields, then use numpy. — Ben, Apr 25 '18 at 18:04
Would the number of C calls be equal to the number of times the function `llf` is evaluated? I feel more concerned about the time it takes to evaluate the function than the overhead of the bridge. I'll admit I'm pretty ignorant on the latter, but I imagine as `N` grows (or the likelihood takes more time to evaluate in another way) the overhead wouldn't matter in comparison. — jtorca, Apr 25 '18 at 18:20
Have you profiled your existing python solution yet? Do you know for certainty that the bottleneck is in llf? Until you've proven that, coding it in C or C++ could be a waste of time. Have a look at using Cython in your program. Even without using C++, you might get some improved performance. Profile, first, however. — sizzzzlerz, Apr 25 '18 at 18:47

jtorca · Answer 1 · 2018-04-26T14:41:35.780

As suggested, I've tried a Cython solution. As I've never used Cython before, I'll be complete in the steps I used to implement the Cython solution.

First, I installed Cython. Then I wrote a file called fastllf.pyx which contained the following Cython code:

#cython: boundscheck=False, wraparound=False, nonecheck=False

from libc.math cimport exp, sqrt, pi, log, isnan

cdef double SQ_PI = sqrt(2*pi)


cdef double norm_pdf(double x, double loc, double scale):
    return (1/(SQ_PI*scale))*exp(-(0.5)*((x - loc)**2)/(scale**2))

cdef double llf_c(double[:, :] X, double[:] params):

    cdef double logll = 0
    cdef int N = X.shape[0]
    cdef int K = X.shape[1]
    cdef int i, j
    cdef double m, sd

    for i in range(N):
        for j in range(K):
            if not isnan(X[i, j]):
                m = params[j]
                sd = exp(params[j+K])

                logll += log(norm_pdf(X[i, j], m, sd))
    return -logll

def llf(double[:, :] X, double[:] params):
    return llf_c(X, params)

Then I created a setup.py file which included the following:

from distutils.core import setup
from Cython.Build import cythonize

setup(name="fastllf", ext_modules=cythonize('fastllf.pyx'))

Next I compiled the Cython code using the following command in the terminal.

$ python3 setup.py build_ext --inplace

Finally, I compared the results between my old, pure Python implementation (slightly modified to use arrays instead of dataframes) and the Cython implementation.

import numpy as np
from scipy.stats import norm
import time
from fastllf import llf as cython_llf

# simulating data
means = np.array([10, 20, 30])
cov = np.diag([1, 4, 10])

N = 100000
np.random.seed(10)

X = np.random.multivariate_normal(mean=means, cov=cov, size=N)
X[np.random.choice([True, False], size=(N, 3), p=[0.3, 0.7])] = np.nan

def norm_pdf(x, loc, scale):
    return (1/(np.sqrt(2*np.pi)*scale))*np.exp(-(0.5)*((x-loc)**2)/(scale**2))

def llf(X, params):

    logll = 0
    N = X.shape[0]
    K = X.shape[1]

    for i in range(N):
        for j in range(K):
            if not np.isnan(X[i, j]):
                m = params[j]
                sd = np.exp(params[j+K])

                logll += np.log(norm_pdf(X[i, j], loc=m, scale=sd))    
    return -logll    

def timeit(fun, *args):
    start = time.time()
    rslt = fun(*args)
    end = time.time()
    print(rslt)
    print(end - start)

params = np.array([1.,1,1,1,1,1])
timeit(llf, X, params)
timeit(cython_llf, X, params)

And I obtained the following results:

Python Value: 6570173.7597125955
Python Time:  1.9558300971984863 seconds
Cython Value: 6570173.7597125955
Cython Time:  0.016242027282714844 seconds

This makes optimization by maximum likelihood much more feasible, especially when my problem gets more complicated. The only problem is that I need to find the math and statistical functions I need to write an llf function in Cython or I need to write my own, as I did for the normal pdf above.

Any comments on my implementation would be appreciated.

I'm not sure if the compiler does precomputation of constants. For instance, in your Cython code, you have constant terms like `sqrt(2*pi)` and `1/2.`. You could predefine (at the top of your code) and replace these manually by `SQ_PI = sqrt(2*pi)` and `0.5`. Also, if you compute `sd` as `inv_sd = exp(-params[j+k])`, you can get rid of all the floating point divisions in `norm_pdf`. Divisions and square roots are computationally very expensive, so replace them with multiplications whenever you can. — MPA, Apr 26 '18 at 07:18
In addition to the above, I'm a little surprised that Cython is over a 1000 times faster than your native Python implementation. Those kind of speed-ups are rather extreme. Could it be that `scipy.stats.norm.pdf()` function is rate limiting? Could you implement your own `norm_pdf` function into native Python and see if it makes any difference? — MPA, Apr 26 '18 at 08:21
I wasn't sure how to implement `sd` as `inv_sd` but I implemented your other suggestion. I also wrote my own norm_pdf in Python and was very surprised to see a difference. Do you know why this is the case? — jtorca, Apr 26 '18 at 14:43
You'd have to check the source code of `scipy.stats.norm.pdf()` to see what exactly it is doing, but apparently there are various background checks or computations being performed that slow it down by a factor 4 compared to your own implementation. The Cython implementation seemed to have gotten slower, though. In your `norm_pdf` function, you divide by `scale` twice, but you can calculate `1/scale` as `inv_sd = exp(-params[j+k]` (`1/exp(x)` = `exp(-x)`), so that in `norm_pdf` you can replace the divisions by multiplications. — MPA, Apr 26 '18 at 14:53
Also, this example assumes independence across random variables. What if I assumed a joint normal distribution, so that I'd have to evaluate a multivariate pdf? This seems much more difficult to do in Cython, given the need for matrix operations. See my other question [here](https://stackoverflow.com/questions/50034761/efficient-matrix-operations-in-cython-with-no-python-objects?noredirect=1#comment87113294_50034761). — jtorca, Apr 26 '18 at 17:51

Combine python and c++, or cython, to optimize a function; maximum likelihood example; little knowledge of c++

1 Answers1