0

I have data about arrays that each have different dimensions. This data looks like this:

spatial_dimensions = {
  'x': 5,
  'y': 2,
  'z': 4
}

Another array could be described like this:

table_dimensions = {
  'rows': 10,
  'columns': 5
}

I also have data about what slots are taken in each array. That is expressed like this for the data pertaining to the spatial_dimensions array:

occupied_slots = [
  [1,2,3],
  [4,2,2],
  [1,1,1]
]

For the table_dimensions array it could e.g. be

occupied_slots = [
  [2,3],
  [5,2],
  [6,1],
  [5,5]
]

I don't have the "full" arrays, only their dimensions and a list of occupied slots.

I want to randomly get an empty slot from an array and return it as a list (that list describes the location in the array).

In the above examples, a random empty slot could be [1, 2, 2] and [4, 3] respectively.

I want to do this in pure Python. I don't want to use numpy as it would introduce a dependency on my project only for this specific issue.

I'm stuck at finding a way of finding an empty slot without re-creating the whole array in memory and filtering out the occupied slots, as I'm afraid that will be too expensive. Especially since there is no limit on the array dimensions.

PS -- This is not an assignment I got; I tried to abstract away all the project details in this question.

Update

I'm currently using this code (based on How to iterate over this n-dimensional dataset?):

import itertools
import random

dimensions = {
  'x': 2,
  'y': 4,
  'z': 3
}

occupied = [
  [2,3,1],
  [1,1,1]
  ]


loopover = [range(1, i + 1) for i in [dimensions[key] for key in dimensions.keys()]]
    
print(random.choice([i for i in itertools.product(*loopover) if list(i) not in occupied])) 

As @ekhumoro commented, this recreates the whole array in memory before passing it to random.choice() which is indeed what I'd like to avoid.

Mathieu Dhondt
  • 8,405
  • 5
  • 37
  • 58
  • Please update your question with the code you have tried. – quamrana Aug 25 '20 at 11:23
  • I'll delete my question; it's a duplicate. After posting it I thought of a different way of I asking my question and found [this answer on "How to iterate over this n-dimensional dataset?"](https://stackoverflow.com/a/45738071/308204). I'll put the resulting code in the question fyi. – Mathieu Dhondt Aug 25 '20 at 11:35
  • 1
    @LaundroMat That approach re-creates the entire array (minus the occupied slots) before passing it to `random.choice` - which you said you wanted to avoid. – ekhumoro Aug 25 '20 at 12:00
  • @ekhumoro Oh you're right! I'll update my question and gear it more towards a question about memory usage. – Mathieu Dhondt Aug 25 '20 at 12:39
  • 1
    @LaundroMat In practice, exactly how sparse are your arrays? Does `occupied_slots` ever approach half the size of the full array? Or is it only ever a small fraction of the whole? If it's the latter, maybe a simple trial and error approach would suffice. – ekhumoro Aug 25 '20 at 15:50
  • @ekhumoro There are instances where occupied_slots is almost the size of the full array. – Mathieu Dhondt Aug 25 '20 at 17:36
  • 1
    @LaundroMat In that case, why bother keeping sparse arrays at all? If you're already allowing `occupied_slots` to approach the size of the full array, the optimisation seems pointless. Have you checked to see how much memory is actually being used? Is there a realistic danger of running out of memory? – ekhumoro Aug 25 '20 at 19:44
  • 1
    @ekhumoro That is a very valid question. I'm also reminded of the adage "no premature optimisation", meaning I should probably not worry too much for now and simply set up monitoring to check if and when memory becomes a problem. Thanks for your help! – Mathieu Dhondt Aug 26 '20 at 12:28

1 Answers1

1

IIUC, could you randomly pick elements and then check them against occupied_slots?

import random

occupied_slots = [
  [1,2,3],
  [4,2,2],
  [1,1,1]
]

n_dim = 3
slots_list = occupied_slots

maxi = max(max(i) for i in slots_list)
mini = min(min(i) for i in slots_list)

empty = random.choices(range(mini, maxi+1), k=n_dim)
while empty in occupied_slots:
    empty = random.choices(range(mini, maxi+1), k=n_dim)

As you point out, if you have many possibilities but few choices left, this will be slow and erratic. With 10,000 options and 1 choice left, my %%timeit had an average of 8 seconds with a large variation.

But in that specific case, it seems like just finding the set difference between all the possible slot arrays and the occupied slots might be the most straightforward.

To integrate these 2 options, you could define a function which has a tweakable threshold for when to choose one approach over the other, i.e. if the number of occupied slots is more than k of the total possibilities, then compute all the possibilities and find the set difference. Otherwise, try randomly picking numbers until you find an empty slot:

def get_empty_slot(occupied, k=.5):
    maxi = max(max(i) for i in occupied)
    mini = min(min(i) for i in occupied)
    n_dim = len(occupied[0])
    numbers = range(mini, maxi+1)
    total_possible = len(numbers) ** n_dim
    if len(occupied) / total_possible < k:
        empty = random.choices(numbers, k=n_dim)
        while empty in occupied:
            empty = random.choices(numbers, k=n_dim)
    else:
        occupied_tuple = [tuple(i) for i in occupied]
        all_combos = itertools.product(numbers, repeat=n_dim)
        leftover = tuple(set(all_combos) - set(occupied_tuple))
        empty = list(random.choice(leftover))
    return empty

I tested this with the following; e should always be [0,0,0] as this is the only possibility:

combos = [list(i) for i in list(itertools.product(range(50), repeat=3))]
combos.remove([0,0,0])

e = get_empty_slot(combos, k=.5)

The set difference approach seems to perform fine with over 100,000 possibilities (and 1 choice left); it also performs well with much fewer possibilities. So, random element picking might not be significantly better in any case (this could be tested), and it begs the question of whether comparison against all possible combinations is really too expensive, and what an alternative would look like if so.

Tom
  • 8,310
  • 2
  • 16
  • 36
  • 1
    If there's only 1 slot left in an array of 10.000 slots, would that not take a lot of time? – Mathieu Dhondt Aug 25 '20 at 11:39
  • 1
    @LaundroMat you can see my edit; I ultimately can't imagine something better than checking against all the possible combinations, but maybe you have found that by now : D – Tom Aug 25 '20 at 12:55
  • 1
    I won't accept the answer but I've upvoted it because you showed me an alternative way of thinking about this. Thanks! – Mathieu Dhondt Aug 26 '20 at 12:29