5

I have a list of N elements, and I'd like to sample M (<= N) values which are as evenly spaced as possible. To be more specific, lets say the selection should minimize the differences in the spacings between sampled points. For example, lets say I'm constructing a boolean indexing array (i.e. in python) to select elements,

I tried the algorithm (from this similar, but different question: How do you split a list into evenly sized chunks?) :

q, r = divmod(N, M)
indices = [q*jj + min(jj, r) for jj in range(M)]

And sometimes that works well:

N=11 M=6
good_index = [0 1 0 1 0 1 0 1 0 1 0]

N=14 M=6
good_index = [0 1 1 0 1 1 0 1 0 1 0 1 0 1]

Here, the first example is trivial because the array can be evenly divided. The second example cannot be evenly divided, but the spacing between points is as similar as possible (2, 2, 1, 1, 1, 1).

But often it works poorly:

N=16 M=10
bad_index = [0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0]

N=14 M=10
bad_index = [0 1 0 1 0 1 0 1 0 0 0 0 0 0]

Because you have values piled up at the end.


Edit 1: woops, just realized each list above is technically inverted (0's should be 1's and visa-versa).... but should still convey the right idea.


Edit 2: the above algorithm tends to work better (i.e. visual inspection from choosing random numbers than something conceptually simpler like,

step = int(floor(N/M))
last = M * step  # this prevents us from getting M+1 elements
indices = [ii for ii in range(0, last, step)]
DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119

1 Answers1

5

Looking at the results of a few tests (even the ones included above), the problem is when M > N/2. I.e. when more than half of the values are being sampled. But it works great for M < N/2. So the solution I'm using for the moment is simply to invert the problem when M > N/2:

Note: this is actually creating a masking list of size N that is False for M elements as evenly spaced as possible.

import numpy as np

def even_select(N, M):
    if M > N/2:
        cut = np.zeros(N, dtype=int)
        q, r = divmod(N, N-M)
        indices = [q*i + min(i, r) for i in range(N-M)]
        cut[indices] = True
    else:
        cut = np.ones(N, dtype=int)
        q, r = divmod(N, M)
        indices = [q*i + min(i, r) for i in range(M)]
        cut[indices] = False

    return cut

I'd still be interested in more elegant solutions if they exist.

DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119