1

Suppose I have the following data

dictData = {} or []
dictData[0] = [1,2,3]
dictData[1] = [10,11,12,13]
dictData[2] = [21,22]

With these data, I want to generate unique 1d arrays that contain randomly selected elements from each of the different arrays. The amount of arrays to be generated is the number of elements in the largest array. All of the elements in the array must be displayed at least once. Elements can be repeated, if all the elements in an array are already used once. The positions from each of the arrays are preserved (eg. a value taken from array 2 is placed at index 2)

A possible outcome is as shown below

possibleOutput = [1,10,21],[1,11,22],[3,12,21],[2,13,21]

I had previously implemented a naive method using a "for" loop starting with the biggest array and just picking one number from each array until exhausted. Is there a more efficient(maybe numpy) way to achieve the same results?

RabbitBadger
  • 539
  • 7
  • 19
  • 1
    Try using numba.jit. I don't think numpy will help much here unless dictData is sufficiently large and all its vectors are only up to 4 elements long. – Mateen Ulhaq Jun 10 '21 at 18:11
  • Thank you for the suggestion. The actual data is actually quite large, the above example was given just for this question – RabbitBadger Jun 10 '21 at 18:14
  • Can you add your "naive" method? It would be easier to make your method more efficient if we know what we are comparing to. – Kraigolas Jun 10 '21 at 18:26

1 Answers1

1

You can try:

def nth_product(num: int) -> list:
    '''
    Calculate n-th element from itertools.product(iterables).

    Inspired from, but slightly improved for this case than:
    https://stackoverflow.com/a/53626712/5431791
    '''

    res = []
    for lst, len_lst in zip(iterables, lens):
        res.append(lst[num % len_lst])
        num //= len_lst

    return res

iterables = dictData.values()
lens = list(map(len, iterables))
indices = np.random.choice(np.prod(lens), size=4, replace=False)

new_arr = list(map(nth_product, indices))
print(new_arr)

Output:

[[1, 12, 21], [3, 13, 21], [2, 13, 22], [2, 10, 21]]

Should be performant.

To make sure all values from the longest list appear:

def nth_product(idx: int, num: int) -> list:
    '''
    Calculate n-th element from itertools.product(iterables).

    Inspired from, but slightly improved for this case than:
    https://stackoverflow.com/a/53626712/5431791
    '''

    res = []
    for lst, len_lst in zip(iterables, lens):
        res.append(lst[num % len_lst]) if len_lst!=max_len else res.append(lst[idx])
        num //= len_lst

    return res

iterables = dictData.values()
lens = list(map(len, iterables))
max_len = max(lens)
indices = enumerate(np.random.choice(np.prod(lens), size=max_len, replace=False))

new_arr = list(map(nth_product, *zip(*indices)))
print(new_arr)

Output:

[[2, 10, 21], [3, 11, 22], [1, 12, 22], [3, 13, 22]]
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
  • Thank you for your time! Although I'd have to point that it doesn't satisfy one of the properties, all values should appear once. As shown in your output in the second index, the value 11 is missing. – RabbitBadger Jun 10 '21 at 18:44
  • The previous approach worked but the recently edited once returned an error ---> 20 new_arr = list(map(nth_product, size=indices, replace=False)) TypeError: map() takes no keyword arguments – RabbitBadger Jun 10 '21 at 18:46
  • @RabbitBadger I don't understand. In your own example `1` appears twice in the first position. `21` appears thrice in the last position. So could you elaborate? – Sayandip Dutta Jun 10 '21 at 18:46
  • @RabbitBadger yeah, sorry. I messed it up while pasting. Now it should be fine. – Sayandip Dutta Jun 10 '21 at 18:48
  • A value can appear more than once but all values must appear at least one time. So ideally, the number of arrays created will be the number of values in the biggest array. (Each value in the biggest array will appear one at a time in the same index) Since other arrays are smaller, values for the smaller array index will be repeated until all the values in the large array is accounted for – RabbitBadger Jun 10 '21 at 18:49
  • Ah, I see. I will make the edit. I was confused by the random part in your question. – Sayandip Dutta Jun 10 '21 at 18:51
  • Thank you much for your time again btw! Although the results I got with the code was: [[3, 10, 22], [1, 11, 22], [3, 12, 22], [2, 13, 22]] as you can see in the 3rd index, they were all 22's without 21s – RabbitBadger Jun 10 '21 at 19:02
  • @RabbitBadger it is coincidence. If you run multiple times, you will see difference. – Sayandip Dutta Jun 10 '21 at 19:13