In some queueing and inventory problems the state of a system can be represented by 1d integer numpy arrays (with a constant length). Now I would like to map states to a unique single number/string to quickly identify them.
At the moment I'm using the following approach for this:
import numpy as np
#Example: state space consists of all integer arrays with 3 elements ranging from 0 to 4.
statespace = np.indices([5]*3).reshape(3,-1).T
#Obtain unique integers by converting to a number system with the smallest possible base.
minvalues = np.amin(statespace,axis=0)
maxvalues = np.amax(statespace,axis=0)
base = 1+np.max(maxvalues-minvalues)
statecode = np.power(base, np.arange(statespace.shape[1]))
def getStateCode(state):
#Convert states to a unique integer by taking the dot product with statecode.
return np.dot(state-minvalues,statecode)
#Obtain codes and sort them.
codes = np.sort(getStateCode(statespace))
def getStateIndex(state):
#Searches for the state in the sorted vector with codes.
statecodes = getStateCode(state)
return np.searchsorted(codes,statecodes).astype(int)
Now
state = np.array([0,0,0])
print(getStateIndex(state))
returns state index 0
.
For small to medium sized problems this approach works well and it is vectorized. However, for larger problems it suffers from integer overflow especially if base
is large.
This occurs in particular when some elements in the state space have larger ranges than others (for example, the first element could range from 0 to 100, while all other elements between 0 and 3, resulting in a base 101 system). Even for state spaces that can easily be stored in memory, this can result in integer overflow of statecode
. Making statecode
a 64bit integer only delays the problem.
Does anybody have an alternative (vectorized?) way to convert 1d integer arrays to a unique identifier without this problem? As these comparisons are carried out repeatedly for millions of states, the approach has to be fast. I've read about hash functions, but I'm a bit hesitant to use those because there is no guarantee of uniqueness.