Gaussian kernel performance

Question

Following method calculates a gaussian kernel:

import numpy as np
def gaussian_kernel(X, X2, sigma):
    """
    Calculate the Gaussian kernel matrix

        k_ij = exp(-||x_i - x_j||^2 / (2 * sigma^2))

    :param X: array-like, shape=(n_samples_1, n_features), feature-matrix
    :param X2: array-like, shape=(n_samples_2, n_features), feature-matrix
    :param sigma: scalar, bandwidth parameter

    :return: array-like, shape=(n_samples_1, n_samples_2), kernel matrix
    """

    norm = np.square(np.linalg.norm(X[None,:,:] - X2[:,None,:], axis=2).T)    
    return np.exp(-norm/(2*np.square(sigma)))

# Usage example
%timeit gaussian_kernel(np.random.rand(5000, 10), np.random.rand(5000, 10), 1)

1.43 s ± 39.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

My question is: is there any ways to increase performance using numpy?

Your code is all vectorized and looks quick as lightning to me. I believe there is a guassian_kernel in scitkit-learn, so you wouldn't have to bother coding it yourself. But I doubt it could be faster than that. Don't forget that creating random numbers takes time as well. — offeltoffel, Jan 24 '20 at 09:52
The best solution depends a bit on the size of the second dimension of your arrays. So it is important to know if the second dimension is always as small as 10 or can also be larger, like 100 or 1000. — max9111, Jan 28 '20 at 13:24

score 1 · Answer 1 · answered Jan 29 '20 at 19:37

For quite small arrays you can write a simple loop implementation and compile it using Numba. For larger arrays the algebraic reformulation using np.dot() will be faster.

Example

#from version 0.43 until 0.47 this has to be set before importing numba
#Bug: https://github.com/numba/numba/issues/4689
from llvmlite import binding
binding.set_option('SVML', '-vector-library=SVML')
import numba as nb
import numpy as np

@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def gaussian_kernel_2(X, X2, sigma):
    res=np.empty((X.shape[0],X2.shape[0]),dtype=X.dtype)
    for i in nb.prange(X.shape[0]):
        for j in range(X2.shape[0]):
            acc=0.
            for k in range(X.shape[1]):
                acc+=(X[i,k]-X2[j,k])**2/(2*sigma**2)
            res[i,j]=np.exp(-1*acc)
    return res

Timings

X1=np.random.rand(5000, 10)
X2=np.random.rand(5000, 10)

#Your solution
%timeit gaussian_kernel(X1,X2, 1)
#511 ms ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit gaussian_kernel_2(X1,X2, 1)
#90.1 ms ± 9.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

score 0 · Answer 2 · answered Jan 28 '20 at 04:31

This post: https://stackoverflow.com/a/47271663/9539058 gave an answer.

Shortly, to copy the numpy part:

import numpy as np
def gaussian_kernel(X, X2, sigma):
    """
    Calculate the Gaussian kernel matrix

        k_ij = exp(-||x_i - x_j||^2 / (2 * sigma^2))

    :param X: array-like, shape=(n_samples_1, n_features), feature-matrix
    :param X2: array-like, shape=(n_samples_2, n_features), feature-matrix
    :param sigma: scalar, bandwidth parameter

    :return: array-like, shape=(n_samples_1, n_samples_2), kernel matrix
    """
    X_norm = np.sum(X ** 2, axis = -1)
    X2_norm = np.sum(X2 ** 2, axis = -1)
    norm = X_norm[:,None] + X2_norm[None,:] - 2 * np.dot(X, X2.T)
    return np.exp(-norm/(2*np.square(sigma)))

# Timing
%timeit gaussian_kernel(np.random.rand(5000, 10), np.random.rand(5000, 10), 1)

976 ms ± 73.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Gaussian kernel performance

2 Answers2