Alternative to np.arange or np.linspace with non-uniform intervals

Question

I can get a uniform grid on [0,2*pi) with numpy's function np.arange(), however, I would want a grid with the same number of points but having more density of points on certain interval, i.e having a finer grid on [pi,1.5*pi] for example. How can I achieve this, is there a numpy function that accepts a density function and it's output is the grid with that density?

Try to invert cumulative distribution function. See https://en.m.wikipedia.org/wiki/Inverse_transform_sampling — tstanisl, Jul 04 '20 at 22:40
Do you need points uniformly spaced locally, or randomly distributed? — amzon-ex, Jul 05 '20 at 04:05

score 5 · Accepted Answer · edited Sep 01 '20 at 17:04

I'm surprised that I can't find a similar Q&A on Stack Overflow. There are a few on doing something similar for random numbers from a discrete distribution, but not for continuous distributions and also not as a modified np.arange or np.linspace.

If you need to get an x range for plotting that has finer sampling in areas where you expect the function to fluctuate more rapidly, you can create a nonlinear function that takes inputs in the range 0 to 1 and produces outputs in the same range that proceeds nonlinearly. For example:

def f(x):
    return x**2

angles = 2*np.pi*f(np.linspace(0, 1, num, endpoint=False))

This will produce fine sampling near zero and coarse sampling near 2*pi. For more fine-grained control of the sampling density, you can use the function below. As a bonus, it will also allow random sampling.

import numpy as np

def density_space(xs, ps, n, endpoint=False, order=1, random=False):
    """Draw samples with spacing specified by a density function.

    Copyright Han-Kwang Nienhuys (2020).
    License: any of: CC-BY, CC-BY-SA, BSD, LGPL, GPL.
    Reference: https://stackoverflow.com/a/62740029/6228891
    
    Parameters:
        
    - xs: array, ordered by increasing values.
    - ps: array, corresponding densities (not normalized).
    - n: number of output values.
    - endpoint: whether to include x[-1] in the output.
    - order: interpolation order (1 or 2). Order 2 will
      require dense sampling and a smoothly varying density 
      to work correctly.
    - random: whether to return random samples, ignoring endpoint).
      in this case, n can be a shape tuple.

    Return:
        
    - array, shape (n,), with values from xs[0] to xs[-1]
    """
    from scipy.interpolate import interp1d
    from scipy.integrate import cumtrapz
    
    cps = cumtrapz(ps, xs, initial=0)
    cps *= (1/cps[-1])
    intfunc = interp1d(cps, xs, kind=order)
    if random:
        return intfunc(np.random.uniform(size=n))
    else:
        return intfunc(np.linspace(0, 1, n, endpoint=endpoint))

Test:

values = density_space(
    [0, 100, 101, 200],
    [1, 1, 2, 2],
    n=12, endpoint=True)

print(np.around(values))

[  0.  27.  54.  82. 105. 118. 132. 146. 159. 173. 186. 200.]

The cumulative density function is created using trapezoid integration, which is essentially based on linear interpolation. A higher-order integration is not safe because the input may have (near-)discontinuities, like the jump from x=100 to x=101 in the example. A discontinuity in the input results in a discontinuous first derivative in the cumulative density function (cps in the code), which will cause problems with a smooth interpolation (order 2 or above). Hence the recommendation to use order=2 only for a smooth density function - and not to use any higher orders.

Doesn't this beg the question? In order to sample the density sufficiently, you need an idea of how quickly the density function changes. A function like this should be written to accept a function as input, and do the rest. If the user has to determine how finely to sample the density function, they have to solve the problem this function is ostensibly for... — Myridium, Jan 13 '21 at 12:43

Alternative to np.arange or np.linspace with non-uniform intervals

1 Answers1