Optimizing with non-negative constraints

Question

Consider the following functions

import numpy as np
import scipy.optimize as opt
import math

# Periodic indexation
def pl(list, i):
    return list[i % len(list)]

# Main function (index j)
def RT(list, j, L):
    return sum((math.e**(-sum((k-abs(i))*pl(list, j+i)/v for i in range(-k, k+1)))-math.e**(-sum((k+1-abs(i))*pl(list, j+i)/v for i in range(-k, k+1))))/(sum(pl(list, j+i) for i in range(-k, k+1))+ 1e-10) for k in range(L+1))

# Vector function
def fp(list):
    return [RT(list, j, math.floor(len(list)/st)) for j in range(len(list))]

for 1<=j<=len(list), where L = np.floor(len(list)/st) and v = 7. We generally take st = 2. Given data ltest, I would like to find the best approximation to the non-negative periodic array list so that |ltest - fp(list)| is minimized (or ltest ≃ fp(list). A mathematical expression for RT can be written as

where {x} = list. For a general list, gradient descent is relatively efficient. However, I have been struggling to find the best possible approximation provided the elements in list are strictly non-negative. Any ideas?

Here is my attempt. Take ltest to be this data, for example. A first guess can be given by

# Initial guess
x0 = [(math.pi/(4*v))*i**(-2) for i in ltest]

I will then use the L-BFGS-B method available from scipy, as follows

# Function to compute the difference between ltest and the list generated by RT
def fun(params, *args):
    list = params
    ltest0 = args[0]
    mse = np.sum((np.array([RT(list, j, math.floor(len(list)/st)) for j in range(len(list))]) - ltest0) ** 2)
    return mse

# Create non-negative bounds
bounds = opt.Bounds(0, np.inf)  # All parameters should be between 0 and infinity

# Call the constrained optimization algorithm
res = opt.minimize(fun, x0, args=(ltest,), method='L-BFGS-B', bounds=bounds)

# The optimized list is stored in res.x
opt_list = res.x

While this is suppose to converge, I find it extremely slow, and far from the solutions that consider any-valued entries of list. Any ideas on how to improve this? While the number of parameters is len(list) (=1000, in the example), perhaps L could be reduced by increasing st.

@Reinderien short acronyms for repetition time and function points, research specific. `fp` is the vector that takes `RT` over all indexes `j`. — sam wolfe, Jul 07 '23 at 15:37
Make sure to provide a good starting point and exact gradients (or use a system that does automatic differentiation). Expect poor performance without these. — Erwin Kalvelagen, Jul 07 '23 at 20:02
I think this is having some kind of numeric stability issue. I can easily fit this function to the provided `ltest / 2`, but when I try to fit it to `ltest`, I start getting overflows when it calculates the exponent. Is it important that you fit to ltest exactly, or would it be OK to fit to the target `ltest / np.mean(ltest)` or `ltest - np.min(ltest)` ? — Nick ODell, Jul 21 '23 at 22:16
@NickODell Hi! The fit to `ltest` would be preferable, as it is precisely this rescaling that has been causing problems. — sam wolfe, Jul 21 '23 at 23:41
I had some luck with a recursive strategy which solves versions of this problem with a smaller number of variables, and uses those solutions to solve larger versions. Unfortunately, I ran into a roadblock: the larger the output variables you fit to, the less wiggly this function becomes, and the harder it is to fit it to a particular curve. [This](https://i.imgur.com/jW1l7IQ.png) is my best attempt at fitting ltest. The notebook I used can be found [here](https://gist.github.com/nickodell/38a4562525ff6ca4fec4a2431417c5a3); the code for this is in optimize3(). — Nick ODell, Jul 22 '23 at 05:14
Part of it seems to be that `ltest` doesn't wrap around at the edges, but any output from this function would have to have the first and last output be almost equal. In many of my outputs, I get this bizarre looking [ringing](https://en.wikipedia.org/wiki/Ringing_artifacts) artifacts, which I think can't be the intended output. — Nick ODell, Jul 22 '23 at 05:17

Reinderien · Answer 1 · 2023-07-08T00:51:53.130

This will be a deep pain to run optimization on unless it runs faster. From top to bottom it hasn't been vectorized, and that needs to happen. The "easy" pass is:

from typing import Sequence

import numpy as np
from scipy.optimize import Bounds, minimize

v = 7
st = 2
exp_v = np.exp(-1/v)


def periodic(values: Sequence[float], i: int | Sequence[int]) -> float | Sequence[float]:
    """Periodic indexation"""
    return values[i % len(values)]


def rt_inner(values: np.ndarray, j: int, k: int) -> float:
    i = np.arange(-k, k + 1)
    per = periodic(values, j + i)

    quotient = (
        exp_v**(
            (k - np.abs(i)).dot(per)
        ) - exp_v**(
            (k - np.abs(i) + 1).dot(per)
        )
     )/(
        per.sum() + 1e-10
    )
    return quotient


def repeat_time(values: np.ndarray, j: int, L: int) -> float:
    """Main function (index j)"""

    return sum(
        rt_inner(values, j, k) for k in range(L + 1)
    )


def function_points(values):
    """Vector function"""
    return [
        repeat_time(values, j, len(values) // st)
        for j in range(len(values))
    ]


def fun(values: np.ndarray, ltest0: np.ndarray) -> float:
    """compute the difference between ltest and the list generated by repeat_time"""
    inner = function_points(values) - ltest0
    mse = inner.dot(inner)
    return mse


def regression_test() -> None:
    x0 = np.pi/4/v / ltest**2
    assert np.allclose(x0[:10], [2.210536497254391e-05, 2.206056701657588e-05, 2.201689200243481e-05, 2.1974207592018127e-05, 2.1932628431138212e-05,
                                 2.18921456967544e-05, 2.185275082723814e-05, 2.181437468386809e-05, 2.177707036172205e-05, 2.1740769518783152e-05])

    assert np.isclose(
        0.35447169592652333,
        repeat_time(np.linspace(0.5, 1.5, 9), 0, 9 // st),
    )
    assert np.isclose(
        0.37331496035895984,
        repeat_time(np.linspace(0.5, 1.5, 9), 1, 9 // st),
    )

    assert np.allclose(
        function_points(np.arange(20)),
        [0.048562605686658246, 0.2561333409217431, 0.22284510468654223, 0.1801348324371491, 0.15240596740034554,
         0.13317771870433162, 0.11878913224487342, 0.10747112835180678, 0.0982518235968824, 0.09054651477020922,
         0.08397926925966721, 0.07829570944628872, 0.07331651598542453, 0.06891093040237876, 0.06498083918680733,
         0.06145079756314917, 0.05826155047254675, 0.05536569520658036, 0.05272480791861977, 0.050932251718794584]
    )

    assert np.isclose(
        140202.5913000041,
        fun(x0, ltest),
    )


def optimize():
    x0 = np.pi/4/v / ltest**2
    res = minimize(fun=fun, x0=x0, args=(ltest,), bounds=Bounds(0, np.inf))

    # The optimized list is stored in res.x
    opt_list = res.x


if __name__ == '__main__':
    regression_test()
    # optimize()

The "hard" part comes in because your inner operation is not square, it's triangular:

from typing import Sequence, Any

import numpy as np
from scipy.optimize import Bounds, minimize

v = 7
st = 2
exp_v = np.exp(-1/v)
L_structures = {}


def make_L_structures(L: int) -> None:
    k = np.arange(L+1)[:, np.newaxis]
    irhs = k - np.arange(L+1)[np.newaxis, :]
    ilhs = k - np.arange(1, L+1)[np.newaxis, :]
    L_structures['ki_tri'] = np.hstack((np.tril(ilhs, k=-1), np.tril(irhs)))
    L_structures['ki_trip1'] = np.hstack((np.tril(ilhs+1, k=-1), np.tril(irhs+1)))


def rt_inner(values: np.ndarray, j: int, L: int) -> float:
    index_into = np.tile(values, 2)
    val_tri_rhs = np.tril(
        np.broadcast_to(index_into[np.newaxis, j:L+j+1], (L+1, L+1))
    )
    val_tri_lhs = np.tril(
        np.broadcast_to(index_into[np.newaxis, -values.size+j-1: -values.size+j-1-L:-1], (L+1, L)),
        k=-1,
    )
    val_tri = np.hstack((val_tri_lhs, val_tri_rhs))

    product = (
        exp_v**(
            (L_structures['ki_tri'] * val_tri).sum(axis=1)
        ) - exp_v**(
            (L_structures['ki_trip1'] * val_tri).sum(axis=1)
        )
     )/(
        val_tri.sum(axis=1) + 1e-10
    )
    return product


def repeat_time(values: np.ndarray, j: int, L: int) -> float:
    """Main function (index j)"""

    return rt_inner(values, j, L).sum()


def function_points(values):
    """Vector function"""
    return [
        repeat_time(values, j, len(values) // st)
        for j in range(len(values))
    ]


def fun(values: np.ndarray, ltest0: np.ndarray) -> float:
    """compute the difference between ltest and the list generated by repeat_time"""
    inner = function_points(values) - ltest0
    mse = inner.dot(inner)
    return mse


def regression_test() -> None:

    x0 = np.pi/4/v / ltest**2
    assert np.allclose(x0[:10], [2.210536497254391e-05, 2.206056701657588e-05, 2.201689200243481e-05, 2.1974207592018127e-05, 2.1932628431138212e-05,
                                 2.18921456967544e-05, 2.185275082723814e-05, 2.181437468386809e-05, 2.177707036172205e-05, 2.1740769518783152e-05])

    make_L_structures(L=9//st)
    assert np.isclose(
        0.35447169592652333,
        repeat_time(np.linspace(0.5, 1.5, 9), 0, 9 // st),
    )
    assert np.isclose(
        0.37331496035895984,
        repeat_time(np.linspace(0.5, 1.5, 9), 1, 9 // st),
    )

    make_L_structures(L=20//st)
    assert np.allclose(
        function_points(np.arange(20)),
        [0.048562605686658246, 0.2561333409217431, 0.22284510468654223, 0.1801348324371491, 0.15240596740034554,
         0.13317771870433162, 0.11878913224487342, 0.10747112835180678, 0.0982518235968824, 0.09054651477020922,
         0.08397926925966721, 0.07829570944628872, 0.07331651598542453, 0.06891093040237876, 0.06498083918680733,
         0.06145079756314917, 0.05826155047254675, 0.05536569520658036, 0.05272480791861977, 0.050932251718794584]
    )

    make_L_structures(L=ltest.size//st)
    assert np.isclose(
        140202.5913000041,
        fun(x0, ltest),
    )


def optimize():
    make_L_structures(L=ltest.size//st)
    x0 = np.pi/4/v / ltest**2
    res = minimize(fun=fun, x0=x0, args=(ltest,), bounds=Bounds(0, np.inf))

    # The optimized list is stored in res.x
    opt_list = res.x


if __name__ == '__main__':
    regression_test()
    # optimize()

(Much) more can be done to vectorize this, such as 2.5-dimensional data (rectangular, packed triangular) - though it needs more work because binning is slow:

from time import monotonic

import cProfile
from typing import Any

import numpy as np
import scipy.optimize


class Problem:
    __slots__ = (
        'ltest', 'exp_v', 'v', 'L',
        'value_idx', 'yj', 'ki',
    )

    def __init__(
        self,
        ltest: np.ndarray,
        v: int = 7,
        st: int = 2,
    ) -> None:
        self.ltest = ltest
        self.exp_v = np.exp(-1/v)
        self.v = v
        self.L = L = ltest.size // st

        # Right and left triangular indices for expanding (-k, k) sum
        yr, xr = np.tril_indices(n=L + 1, m=L + 1)
        yl, xl = np.tril_indices(n=L + 1, m=L + 1, k=-1)
        y = np.concatenate((yl, yr))
        j = np.arange(ltest.size)[:, np.newaxis]
        self.yj = (y[np.newaxis, :] + j*(L+1)).ravel()

        # Right and left value array indices
        vr = xr
        vl = -1 - xl
        # j-offset periodic indexing
        value_idx = (np.concatenate((vl, vr)) + j) % ltest.size
        # bincount does not natively work with n-dimensional data. One dimension needs to be collapsed.
        self.value_idx = value_idx.ravel()

        # Right and left k-abs(i) terms
        kir = yr - xr
        kil = kir[:-L - 1]
        ki = np.concatenate((kil, kir))
        self.ki = np.broadcast_to(ki, value_idx.shape).ravel()

    def function_points(self, values: np.ndarray) -> np.ndarray:
        values_used = values[self.value_idx]
        sumsv = np.bincount(self.yj, weights=values_used)
        p0, p1 = self.exp_v ** np.vstack((
            np.bincount(self.yj, weights=self.ki * values_used),
            np.bincount(self.yj, weights=(self.ki+1) * values_used),
        ))

        product = (
            (p0 - p1)/(sumsv + 1e-10)
        ).reshape((values.size, -1))
        return product.sum(axis=1)

    def mse(self, values: np.ndarray) -> float:
        """compute the difference between ltest and the list generated by function_points"""
        inner = self.function_points(values) - self.ltest
        return inner.dot(inner)

    def x0(self) -> np.ndarray:
        return np.pi/4/self.v / self.ltest**2

    def optimize(self, **kwargs: Any) -> scipy.optimize.OptimizeResult:
        return scipy.optimize.minimize(
            fun=self.mse,
            x0=self.x0(),
            bounds=scipy.optimize.Bounds(0, np.inf),
            options=kwargs,
        )


def regression_test() -> None:

    p = Problem(np.arange(20))
    assert np.allclose(
        p.function_points(np.arange(20)),
        [0.048562605686658246, 0.2561333409217431, 0.22284510468654223, 0.1801348324371491, 0.15240596740034554,
         0.13317771870433162, 0.11878913224487342, 0.10747112835180678, 0.0982518235968824, 0.09054651477020922,
         0.08397926925966721, 0.07829570944628872, 0.07331651598542453, 0.06891093040237876, 0.06498083918680733,
         0.06145079756314917, 0.05826155047254675, 0.05536569520658036, 0.05272480791861977, 0.050932251718794584]
    )

    p = Problem(ltest)
    x0 = np.pi/4/p.v / ltest**2
    assert np.allclose(x0[:10], [2.210536497254391e-05, 2.206056701657588e-05, 2.201689200243481e-05, 2.1974207592018127e-05, 2.1932628431138212e-05,
                                 2.18921456967544e-05, 2.185275082723814e-05, 2.181437468386809e-05, 2.177707036172205e-05, 2.1740769518783152e-05])

    t0 = monotonic()
    assert np.isclose(140202.5913000041, p.mse(x0))
    t1 = monotonic()
    print(t1 - t0)  # 6+ seconds


def profile() -> None:
    """
    37 function calls (35 primitive calls) in 8.537 seconds
    Ordered by: cumulative time

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.056    0.056    8.537    8.537 76637914.py:198(mse)
         1    2.674    2.674    8.481    8.481 76637914.py:185(function_points)
       6/4    5.806    0.968    5.806    1.452 {built-in method numpy.core._multiarray_umath.implement_array_function}
         3    0.000    0.000    5.804    1.935 <__array_function__ internals>:177(bincount)
    """
    p = Problem(ltest)
    with cProfile.Profile() as pr:
        p.mse(p.x0())
    pr.print_stats(sort='cumtime')


def optimize() -> None:
    prob = Problem(ltest)
    result = prob.optimize(disp=True)
    print(result)


if __name__ == '__main__':
    # regression_test()
    # profile()
    optimize()

Even this gets, at best, six seconds (!) per iteration. A serious treatment of this problem will need to drop down to C and/or take advantage of a GPU.

Thanks! I will try this and get back to you. Could you please clarify what you mean by the "easy pass"? — sam wolfe, Jul 08 '23 at 18:25
As in, the first code shown was the basic set of vectorization improvements that consistute the bare minimum, before the difficult work shown in subsequent code. — Reinderien, Jul 08 '23 at 18:27
On the definition of `periodic`, I am getting the error "only integer scalar arrays can be converted to a scalar index", regarding `values[i % len(values)]`. Any idea why? — sam wolfe, Jul 10 '23 at 10:01
Or maybe because `i` is being treated as a vector of a dimension not supported by your current version of Numpy. Verify: `numpy==1.24.1` — Reinderien, Jul 10 '23 at 12:59
I have numpy version 1.25.1. When printing `i.dtype` within `periodic`, I always get `int64` until it breaks, but I get no other `dtype` value. — sam wolfe, Jul 10 '23 at 13:56
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254422/discussion-between-sam-wolfe-and-reinderien). — sam wolfe, Jul 10 '23 at 13:58

Optimizing with non-negative constraints

1 Answers1

Linked