You can use contiguous array view of shape(3, -1)
, find locations of unique occurences and replace them in these locations:
def view_ascontiguous(a): # a is array
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel()
def replace(x, args, subs, viewer):
u, inv = np.unique(viewer(x), return_inverse=True)
idx = np.searchsorted(viewer(args), u)
return subs[idx][inv]
>>> replace(x=np.array([1, 0, 1, 0, 0, 1, 1, 0, 1]).reshape(-1, 3),
args=np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]),
subs=np.array([ 5, 57, 58, 44, 67, 17, 77, 1]),
viewer=view_ascontiguous)
array([17, 57, 17])
Note that here idx
means locations of unique contiguous blocks of length N in a Power Set {0, 1}^N
.
In case viewer(args)
maps args
to [0, 1, 2, 3, ...]
inside np.searchsorted
method, replacing it with np.arange(len(args))
helps to improve performance.
This algorithm could be also used in more general problem:
You are given array of dtype=np.uint8
of M*N
values 0 & 1. You are also given a mapping between Power Set [0, 1]^N
(all possible blocks of length N
of 0 & 1) and some scalar values. Find an array of M
values following these steps:
- Split array you are given into
M
contiguous blocks of length N
- Replace each block with scalar value using a mapping you are given
Now, fun part: you can use your own viewer
. It is required to map an array you pass in args
to any kind of ascending indices like so:
viewer=lambda arr: np.ravel_multi_index(arr.T, (2,2,2)) #0, 1, 2, 3, 4, 5, 6, 7
viewer=lambda arr: np.sum(arr * [4, 2, 1], axis=1) #0, 1, 2, 3, 4, 5, 6, 7
viewer=lambda arr: np.dot(arr, [4, 2, 1]) #0, 1, 2, 3, 4, 5, 6, 7
Or even more fun:
viewer=lambda arr: 2*np.dot(arr, [4, 2, 1]) + 1 #1, 3, 5, 7, 9, 11, 13, 15
viewer=lambda arr: np.vectorize(chr)(97+np.dot(arr, [4, 2, 1])) #a b c d e f g h
since you could also map
[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]
to any ascending sequence you could think of like [1, 3, 5, 7, 9, 11, 13, 15]
or ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
and the result remains the same.
Further notes
Thanks @MadPhysicist,
np.packbits(example.reshape(-1, N), axis=1, bitorder='little').ravel()
also does the trick. It's pretending to be the fastest solution since np.packbits
is optimised quite well in numpy
.