0

In some queueing and inventory problems the state of a system can be represented by 1d integer numpy arrays (with a constant length). Now I would like to map states to a unique single number/string to quickly identify them.

At the moment I'm using the following approach for this:

import numpy as np

#Example: state space consists of all integer arrays with 3 elements ranging from 0 to 4.
statespace      = np.indices([5]*3).reshape(3,-1).T 

#Obtain unique integers by converting to a number system with the smallest possible base. 
minvalues       = np.amin(statespace,axis=0)
maxvalues       = np.amax(statespace,axis=0)
base            = 1+np.max(maxvalues-minvalues)        
statecode       = np.power(base, np.arange(statespace.shape[1]))

def getStateCode(state):
    #Convert states to a unique integer by taking the dot product with statecode.
    return np.dot(state-minvalues,statecode)

#Obtain codes and sort them.    
codes = np.sort(getStateCode(statespace))  

def getStateIndex(state):
    #Searches for the state in the sorted vector with codes.
    statecodes  = getStateCode(state)
    return np.searchsorted(codes,statecodes).astype(int)  

Now

state = np.array([0,0,0])
print(getStateIndex(state))

returns state index 0.

For small to medium sized problems this approach works well and it is vectorized. However, for larger problems it suffers from integer overflow especially if base is large.

This occurs in particular when some elements in the state space have larger ranges than others (for example, the first element could range from 0 to 100, while all other elements between 0 and 3, resulting in a base 101 system). Even for state spaces that can easily be stored in memory, this can result in integer overflow of statecode. Making statecode a 64bit integer only delays the problem.

Does anybody have an alternative (vectorized?) way to convert 1d integer arrays to a unique identifier without this problem? As these comparisons are carried out repeatedly for millions of states, the approach has to be fast. I've read about hash functions, but I'm a bit hesitant to use those because there is no guarantee of uniqueness.

Forzaa
  • 1,465
  • 4
  • 15
  • 27
  • Is the length the same for all states? – dnalow Sep 21 '16 at 09:37
  • @dnalow Yes. I mention that now in the question. – Forzaa Sep 21 '16 at 09:42
  • *"Making `statecode` a 64bit integer only delays the problem."* - so what is the size of the largest state space you need to represent? If it's bigger than 2⁶⁴ then there's no way you can use single integers as unique identifiers. What do you want to use the identifiers for? – ali_m Sep 21 '16 at 09:50
  • 1
    You can use a [mixed radix numbering](https://en.wikipedia.org/wiki/Mixed_radix), see numpy.ravel_multi_index and numpy.unravel_index. The unique identifier is essentially the linear index into the statespace array. – user6758673 Sep 21 '16 at 09:53
  • I was also just going to ask: Why not different bases per columns, but that is essentially what user6758673 means. I also think this way you get the smallest range of numbers possible for this countable set – dnalow Sep 21 '16 at 09:55
  • 1
    Assuming that the total size of the state space is ≤ 2⁶⁴ you could use [this approach](http://stackoverflow.com/a/16973510/1461210) to map each unique row to an int64 index. – ali_m Sep 21 '16 at 10:07
  • @ali_m I want to use the identifiers to obtain the index of a certain state in an array with costs/rewards. Each state makes random transitions to other states, and I need the indices to calculate the expected cost/reward. The example in the question already shows there can be more state codes than states. – Forzaa Sep 21 '16 at 10:08
  • @user6758673 thanks a lot. That works extremely well. To apply this to an array with multiple states, this can be used: `np.ravel_multi_index((state-minvalues).transpose().reshape(statespace.shape[1],-1), maxvalues-minvalues+1)`. In cases where not all states are included, one will still have to search a sorted list to get the correct indices. – Forzaa Sep 21 '16 at 12:00

0 Answers0