0

I have a series of n 2D arrays that are presented to a function as a 3D array of depth n. I want to generate a tuple of each set of values along the third axis, then replace each of these tuples with a single index value and a lookup table.

I'm working in python, with some large datasets so it needs to be scalable, so will probably use numpy. Other solutions are accepted though.

Here's what I've got so far:

In [313]: arr=np.array([[[0,0,0],[1,2,2],[3,0,0]],[[0,1,0],[1,3,2],[0,0,0]]])

In [314]: stacked = np.stack((arr[0], arr[1]), axis=2)

In [315]: pairs = stacked.reshape(-1, arr.shape[0])

In [316]: pairs
Out[316]:
array([[0, 0],
       [0, 1],
       [0, 0],
       [1, 1],
       [2, 3],
       [2, 2],
       [3, 0],
       [0, 0],
       [0, 0]])

In [317]: unique = set([tuple(a) for a in pairs])

In [318]: lookup = sorted(list(unique))

In [319]: lookup
Out[319]: [(0, 0), (0, 1), (1, 1), (2, 2), (2, 3), (3, 0)]

Now, I want to create an output array, using the indexes of the values in the lookup table, so the output would be:

[0, 1, 0, 2, 4, 3, 5, 0, 0]

This example is just with two input 2D arrays, but there could be many more.

jon_two
  • 1,073
  • 1
  • 23
  • 34
  • can you share what error getting? – Akash Sep 14 '17 at 18:41
  • I'm not getting an error, I'm just not sure how to get from the pairs and lookup arrays to the output array. I've been staring at it too long and it's making my head hurt! – jon_two Sep 14 '17 at 18:43
  • Think [this](https://stackoverflow.com/questions/38674027/find-the-row-indexes-of-several-values-in-a-numpy-array) should solve it – Divakar Sep 14 '17 at 18:45

1 Answers1

0

So, I've come up with a solution that produces the outputs I want, but is it the most efficient method of doing this? In particular, the lookup.index call is a bit costly. Does anyone have a better way?

def squash_array(arr):
    tuples = arr.T.reshape(-1, arr.shape[0])
    lookup = sorted(list(set([tuple(a) for a in tuples])))
    out_arr = np.array([lookup.index(tuple(a)) for a in tuples]).reshape(arr.shape[1:][::-1]).T
    return out_arr, lookup
jon_two
  • 1,073
  • 1
  • 23
  • 34