My custom distance for Bayesian Optimization runs very slow in scipy.spatial.distance.cdist function

Question

Greetings, I am trying to perform bayesian optimization using the bayesian_optimization library with a custom kernel function, concretly a RBF version which uses the kendall distance.

I am trying to pass as an argument the kendall distance, to the cdist and pdist functions located in scipy.spatial.distance. I am reusing the code of the kendalltau function from scipy.stats.kendalltau, concretly, my kendall distance is defined as follows:

def kendall_distance(x,y):
    perm = np.argsort(y)  # sort on y and convert y to dense ranks
    x, y = x[perm], y[perm]
    y = np.r_[True, y[1:] != y[:-1]].cumsum(dtype=np.intp)

    # stable sort on x and convert x to dense ranks
    perm = np.argsort(x, kind='mergesort')
    x, y = x[perm], y[perm]
    x = np.r_[True, x[1:] != x[:-1]].cumsum(dtype=np.intp)

    dis = _kendall_dis(x, y)  # discordant pairs
    return dis

With this distance, I define my custom kernel function,

class PermutationRBF(StationaryKernelMixin, NormalizedKernelMixin, Kernel):
    def __init__(self, alpha=1.0, alpha_bounds=(1e-5, 1e5)):
        self.alpha = alpha
        self.alpha_bounds = alpha_bounds

    @property
    def anisotropic(self):
        return np.iterable(self.alpha) and len(self.alpha) > 1

    @property
    def hyperparameter_length_scale(self):
        if self.anisotropic:
            return Hyperparameter("length_scale", "numeric",
                                  self.alpha_bounds,
                                  len(self.alpha))
        return Hyperparameter(
            "alpha", "numeric", self.alpha_bounds)

    def __call__(self, X, Y=None, eval_gradient=False):
        X = np.atleast_2d(X)
        alpha = _check_length_scale(X, self.alpha)
        if Y is None:
            dists = pdist(X / alpha, kendall_distance) # First change
            K = np.exp(-.5 * dists)
            # convert from upper-triangular matrix to square matrix
            K = squareform(K)
            np.fill_diagonal(K, 1)
        else:
            if eval_gradient:
                raise ValueError(
                    "Gradient can only be evaluated when Y is None.")
            dists = cdist(X / alpha, Y / alpha, kendall_distance) # Second change
            K = np.exp(-.5 * dists)
        if eval_gradient:
            if self.hyperparameter_length_scale.fixed:
                # Hyperparameter l kept fixed
                return K, np.empty((X.shape[0], X.shape[0], 0))
            elif not self.anisotropic or alpha.shape[0] == 1:
                K_gradient = \
                    (K * squareform(dists))[:, :, np.newaxis]
                return K, K_gradient
            elif self.anisotropic:
                # We need to recompute the pairwise dimension-wise distances
                K_gradient = (X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2 \
                    / (alpha ** 2)
                K_gradient *= K[..., np.newaxis]
                return K, K_gradient
        else:
            return K

The only change from the original (tagged with comments) is to pass the kendall_distance function as an argument to the cdist and pdist functions.

The problem appears when I run the optimization with that kernel, the performance is very slow compared with the RBF. The computation of Kendall distance O(n log n) is harder than the Euclidean O(n), however, the difference should not be so noticeable in small sizes.

Custom kernel

iteration:  0
time:  0.0008704662322998047
iteration:  1
time:  1.2141697406768799
iteration:  2
time:  2.3469510078430176
iteration:  3
time:  3.5015127658843994
iteration:  4
time:  4.695566892623901

RBF kernel


iteration:  0
time:  0.000415802001953125
iteration:  1
time:  0.020179033279418945
iteration:  2
time:  0.033345937728881836
iteration:  3
time:  0.033483028411865234
iteration:  4
time:  0.0286252498626709

You can see the complete notebook and the results here.

I think the problem is because if I call pdist and cdist functions with some integrated distance, like the Euclidean, the functions runs the following code

    elif isinstance(metric, str):
        mstr = metric.lower()

        mstr, kwargs = _select_weighted_metric(mstr, kwargs, out)

        metric_name = _METRIC_ALIAS.get(mstr, None)

        if metric_name is not None:
            X, typ, kwargs = _validate_pdist_input(X, m, n,
                                                   metric_name, **kwargs)

            if 'w' in kwargs:
                metric_name = _C_WEIGHTED_METRICS.get(metric_name, metric_name)

            # get pdist wrapper
            pdist_fn = getattr(_distance_wrap,
                               "pdist_%s_%s_wrap" % (metric_name, typ))
            pdist_fn(X, dm, **kwargs)
            return dm

which I suppose will be programmed and optimized in C. On the other hand, if I use my custom metric, the executed code fragment is

    if callable(metric):
        mstr = getattr(metric, '__name__', 'UnknownCustomMetric')
        metric_name = _METRIC_ALIAS.get(mstr, None)

        if metric_name is not None:
            X, typ, kwargs = _validate_pdist_input(X, m, n,
                                                   metric_name, **kwargs)

        k = 0
        for i in range(0, m - 1):
            for j in range(i + 1, m):
                dm[k] = metric(X[i], X[j], **kwargs)
                k = k + 1

which is pure python, slower by default.

Is it possible that this is the reason of the bottleneck? In case it is, I need more efficiency in my execution, is there any way to improve this?

I leave the test code again here Complete Notebook

Edit with new information:

As the comments suggest, I use line_profiler to found the bottleneck and these are the obtained results simplified:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
.
.
.
58         1        879.0    879.0     92.4          dm = calculate_pdist_dm(metric,dm,m,X)
.
.
.

where calculate_pdist_dm is as follows

def calculate_pdist_dm(metric,dm,m,X):
    k = 0
    for i in range(0, m - 1):
        for j in range(i + 1, m):
            dm[k] = metric(X[i], X[j])
            k = k + 1
    return dm

and the metric function is the previously commented Kendall distance with the following time results.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     2                                           @do_profile()
     3                                           def kendall_distance(x,y):
     4         1         11.0     11.0      6.9      perm = np.argsort(y)  # sort on y and convert y to dense ranks
     5         1          1.0      1.0      0.6      x, y = x[perm], y[perm]
     6         1         59.0     59.0     37.1      y = np.r_[True, y[1:] != y[:-1]].cumsum(dtype=np.intp)
     7                                           
     8                                               # stable sort on x and convert x to dense ranks
     9         1          9.0      9.0      5.7      perm = np.argsort(x, kind='mergesort')
    10         1          1.0      1.0      0.6      x, y = x[perm], y[perm]
    11         1         50.0     50.0     31.4      x = np.r_[True, x[1:] != x[:-1]].cumsum(dtype=np.intp)
    12                                           
    13         1         28.0     28.0     17.6      dis = _kendall_dis(x, y)  # discordant pairs
    14         1          0.0      0.0      0.0      return dis

Timer unit: 1e-06 s

Total time: 0.000159 s

Also say that if I use a pure Python implementation for Kendall distance,

def kendall_distance(x,y):
    distance = 0
    for i in range(len(x)):
        for j in range(i,len(x)):
            a = x[i] - x[j]
            b = y[i] - y[j]
            if (a * b < 0):
                distance += 1
    return distance

I can use numba to execute more efficiently but I'm still far from the efficiency of the built-in distances (results below in more advanced iterations 95-99 where Bayesian optimization becomes computationally expensive). Is there any way to improve this?

Custom kernel with Kendall distance and numba

iteration:  95
time:  0.3591618537902832
iteration:  96
time:  0.41269588470458984
iteration:  97
time:  0.40320730209350586
iteration:  98
time:  0.40665769577026367
iteration:  99
time:  0.37867259979248047

RBF kernel with the scipy built-in Euclidean distance

iteration:  95
time:  0.04483485221862793
iteration:  96
time:  0.046830177307128906
iteration:  97
time:  0.03493475914001465
iteration:  98
time:  0.03614163398742676
iteration:  99
time:  0.042229413986206055

The new code with the use of numba is here

You can use numba or cython for your own function and you can change the code of scipy and convert the function to cython. If you do both you may even get faster times than standard kernels in scipy. — keiv.fly, Jul 07 '21 at 10:41
@keiv.fly I have used numba and it seems to work, the only thing, in the implementation of kendall's distance, I call to a cython function, which numba seems not to recognize, I read that this can't be fixed, should i rewrite kendall's distance function in pure python? Thank you in advance [source code](https://github.com/Biowav/bayesian_optimization/blob/main/Stack%20Overflow%20Question%202.ipynb) — Biowav, Jul 07 '21 at 18:07
@Biowav Use `line_profiler` to find out what exactly is the bottleneck. [Here I show how to use it](https://stackoverflow.com/a/67373005/4601890). — dankal444, Jul 08 '21 at 06:58
Numba does not allow to call Cython functions. So you need to rewrite your numba code into Cython code. — keiv.fly, Jul 08 '21 at 07:34
I don't really understand where the metric function is defined, which you use in Numba. You may have repeating recompilations or something like that. An easy way would be to generate every distance function only once, and than use. https://stackoverflow.com/a/58752553/4045774 This should consitently outperform scipy.cdist. I don't know the problem size, turn off parallelization if the matrices are too tiny. — max9111, Jul 23 '21 at 13:53

My custom distance for Bayesian Optimization runs very slow in scipy.spatial.distance.cdist function

0 Answers0