1

I have an numpy 2D array like this:

np.array([[1,1,1,0], [1,0,0,1]])

How can I apply RLE on this 2D array efficiently? The shape of my data set is (4000, 3000)

I am able to do rle on string with this logic without using numpy.

    for i in new_bin_data:
        if i == '0':
            if prev != i:
                final_result.append(count)
                count = 0
                prev = '0'
            count += 1
        else:
            if prev != i:
                final_result.append(count)
                count = 0
            count += 1
            prev = '1'
nirvair
  • 4,001
  • 10
  • 51
  • 85

1 Answers1

1

Not sure what you're looking for. Here's some code that computes the RLE encoding of rows.

def rle(inarray):
    """
    From: https://stackoverflow.com/questions/1066758/find-length-of-sequences-of-identical-values-in-a-numpy-array-run-length-encodi

    run length encoding. Partial credit to R rle function.
        Multi datatype arrays catered for including non Numpy
    returns: tuple (runlengths, startpositions, values)
    """
    ia = np.asarray(inarray)  # force numpy
    n = len(ia)
    if n == 0:
        return (None, None, None)
    else:
        y = ia[1:] != ia[:-1]  # pairwise unequal (string safe)
        i = np.append(np.where(y), n - 1)  # must include last element posi
        z = np.diff(np.append(-1, i))  # run lengths
        p = np.cumsum(np.append(0, z))[:-1]  # positions
        return (z, p, ia[i])



def rle_2d(a, unused_value):
    """
    compute rle encoding of each row in the input array

    Args:
        a: 2d numpy array
        unused_value: a value that does not appear in the input array a

    Returns:
        list of (length, positions, values) tuples. The length of the list is the number of rows in
        the input matrix

    """
    r, c = a.shape          # rows, columns
    a = np.hstack([a, np.ones((r, 1), dtype=a.dtype) * unused_value])
    a = a.reshape(-1)
    l, p, v = rle(a)        # length,  positions, values
    y = p // (c + 1)
    x = p % (c + 1)

    rl, rp, rv = rle(y)
    result = []
    for i in range(r):
        assert(rv[i] == i)
        li = l[rp[i]: rp[i] + rl[i] - 1]
        pi = x[rp[i]: rp[i] + rl[i] - 1]
        vi = v[rp[i]: rp[i] + rl[i] - 1]
        result.append((li, pi, vi))
    return result
shacharf
  • 21
  • 2