2D Numpy array of dtype=int: maximize distance between identical integers

Question

Apologies for double-posting. I haven't found a solution yet.

Former post: Conditional Numpy shuffling

Problem I have a randomly shuffled 2D array of dtype=int and would like to maximize the distance between identical integers. This is an optimization problem.

E.g. I have:

np.array([1, 2, 2, 2],
         [3, 2, 1, 1],
         [1, 3, 3, 1],
         [1, 5, 5, 1])

Some integers are too close to each other in 0-axis and 1-axis direction. E.g. both 5. Is there a heuristic or scipy-optimization approach that somewhat solves this problem?

Thank you in advance and kind regards

And brute forcing (making as many shuffles until you extend a certain threshold) is no option? What do you expect from the solution? An iterative way so choosing permutations that brings closer values apart in **every** step? — some_name.py, Jan 04 '22 at 10:41
Thanks for your sugguestion. Brute forcing would be my second choice. When it comes to the iterative way, based on which criteria would I permutate my array? — Novorodnas, Jan 04 '22 at 11:18
@Novorodnas Select a random transposition (i,j) <-> (k,l); if it increases the min-distance, then apply it; if it doesn't, then don't apply it. — Stef, Jan 04 '22 at 12:11
@Novorodnas yes I think what was proposed above makes a lot of sense... maybe keep in mind: If you do that the array is not random anymore... it think the optimal solution that you are converging to would be alternating large and small numbers in some way shape or form... but if you want that: just design your array that way in the first place... I could be wrong here but just make sure its not that you end up with sth like [1,99,2,98,3,97...] — some_name.py, Jan 04 '22 at 12:48
I will give this a try. Thank you guys. @some_name.py Btw the value of the integers don't matter. I just want to maximize the distance. *Edit: Oh wait, it does. I will try out the alternating option or see what I can do. Very good suggestion. Thanks! — Novorodnas, Jan 04 '22 at 12:51
You could have a "measure of satisfaction" function that gives a number telling you whether too many similar elements are too close; then you start with a random.shuffle, and then as long as the satisfaction is not above some threshold, keep applying random transpositions that increase the satisfaction. This way, there should still be a fair amount of randomness in your final result, depending on the satisfaction threshold. — Stef, Jan 04 '22 at 12:53
@Novorodnas Ah ok didnt get it right then... What dimensions are we talking about? So how large is the array, whats the range of the values insight (so how many of each element do we expect)? — some_name.py, Jan 04 '22 at 12:55
@some_name.py just [8, 12] and equal amounts of integers ranging from 1 to 6. Nothing too big. For this case. In the future I could also be having e.g. integers from 1-48. — Novorodnas, Jan 04 '22 at 12:56
@Novorodnas Ok yes sounds quite reasonable... Let us know when you solved it. — some_name.py, Jan 04 '22 at 13:01
Yes do please post an answer to your own question when you have a working solution. (or several different working solutions!) — Stef, Jan 04 '22 at 13:02
What did you learn from the previous question? You accepted an answer. Even if that wasn't enough, show how you used it, and what problems you still have. Otherwise we might mark this as a duplicate. — hpaulj, Jan 04 '22 at 16:38
@hpaulj The previous question is about a 1d array. This one is about a 2d array. They're definitely not duplicates. — Stef, Jan 04 '22 at 16:58
For a given solution how you calculate the distance of it? You could start giving the code and showing what is the total distance for the example you provided. — Bob, Jan 06 '22 at 13:19
Hey guys. I promised I would post my solution. So here it is. I won't work on it much further since it doesn't have priority anymore. — Novorodnas, Feb 08 '22 at 22:22

Novorodnas · Accepted Answer · 2022-02-08T22:37:33.163

I invented my own little approach making use of scipy.ndimage.distance_transform_edt and random sampling. It is far from optimal, but it does the job quite okay, depending on your aim:

import numpy as np
from scipy.ndimage import distance_transform_edt

def maximize_arr_replicate_distances(input_arr: np.array) -> np.array:
    
    # Assert the input array
    assert isinstance(input_arr, np.ndarray), "'input_arr' is not a Numpy array."
    assert input_arr.ndim == 2, f"'input_arr' is not an 'm x n' matrix."
    assert np.issubdtype(input_arr.dtype, np.integer), "'input_arr' is not of dtype class 'integer'."

    # Bin: Bin all unique elements into subarrays
    unique_elements = np.unique(input_arr)
    binned_uniques = [input_arr[input_arr == i].tolist() for i in unique_elements]

    # Determine a sampling order: randomly sample from subarrays without replacement and append to new list
    sampling_order = []
    while(any(binned_uniques)):
        random_index = np.random.randint(unique_elements.size)
        if binned_uniques[random_index]:
            sampling_order.append(binned_uniques[random_index].pop())

    # Distribute elements: Append elements to empty array based on sampling order and maximum possible distance between replicates
    output_arr = np.full(input_arr.shape, np.nan)
    for i in sampling_order.copy():
        if np.any(output_arr == i):
            distance_transform = distance_transform_edt(output_arr != i)
            mask = np.in1d(output_arr, unique_elements[unique_elements != i]).reshape(input_arr.shape)
            distance_transform_masked = np.ma.masked_array(distance_transform, mask=mask)
            distance_transform_max = np.argwhere(distance_transform_masked == np.amax(distance_transform_masked))
            rand_max_y, rand_max_x = distance_transform_max[np.random.randint(len(distance_transform_max))]
            output_arr[rand_max_y, rand_max_x] = i
        else:
            nan_indices = np.argwhere(np.isnan(output_arr))
            rand_nan_y, rand_nan_x = nan_indices[np.random.randint(len(nan_indices))]
            output_arr[rand_nan_y, rand_nan_x] = i

    return output_arr.astype(int)

input_arr = np.array([[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                      [1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                      [1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                      [1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                      [1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                      [1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                      [1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                      [1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

print(maximize_arr_replicate_distances(input_arr))

# [[7 0 0 0 1 0 0 0 1 0 0 7]
#  [0 0 0 0 0 0 0 0 0 0 0 1]
#  [1 0 0 0 0 0 7 0 0 0 0 0]
#  [0 0 0 0 0 0 0 1 0 0 0 0]
#  [0 0 0 7 0 0 0 0 0 0 1 0]
#  [7 0 0 0 0 0 0 0 7 0 0 0]
#  [0 0 0 0 0 0 0 0 0 0 0 0]
#  [0 0 1 0 7 0 0 0 0 1 0 7]]

It works okay with all kinds of inputs.

Note that the algorithm is biased towards the borders of an array. E.g. 1 will have the following frequency distribution [%] if you repeat this 10,000 times:

from typing import Callable

def evaluate(input_arr: np.array, function: Callable[[np.array], np.array], sample_index: int, iterations: int = 10_000) -> np.array:

    sum_arr = np.zeros((input_arr.shape))
    for i in range(iterations):
        sum_arr += np.where(function(input_arr) == sample_index, 1, 0)
    hit_frequencies = sum_arr / sum_arr.sum() * 100  # [%]

    return np.around(hit_frequencies, 2) 

maximize_replicate_distances = evaluate(input_arr, maximize_arr_replicate_distances, 1)

print(maximize_replicate_distances)

# Frequency distribution of sample replicate '1' after distance-maximizing shuffling (n = 10,000) [%]:
# [[4.06 2.1  0.93 1.25 1.34 2.02 2.1  1.38 1.21 0.93 2.15 4.04]
#  [2.48 0.26 0.15 0.3  0.41 0.85 0.82 0.42 0.28 0.13 0.27 2.42]
#  [1.07 0.14 0.13 0.46 0.5  0.58 0.63 0.55 0.44 0.12 0.14 1.08]
#  [1.55 0.4  0.9  1.49 0.82 0.7  0.77 0.8  1.48 0.84 0.44 1.61]
#  [1.67 0.44 0.92 1.48 0.81 0.74 0.76 0.81 1.42 0.88 0.43 1.72]
#  [1.04 0.14 0.12 0.4  0.55 0.59 0.6  0.52 0.47 0.14 0.16 1.09]
#  [2.44 0.24 0.15 0.26 0.42 0.88 0.81 0.41 0.3  0.15 0.26 2.33]
#  [3.99 2.11 0.97 1.33 1.39 1.98 2.01 1.42 1.3  0.97 2.12 3.96]]

2D Numpy array of dtype=int: maximize distance between identical integers

1 Answers1