9

I'd like to 'shear' a numpy array. I'm not sure I'm using the term 'shear' correctly; by shear, I mean something like:

Shift the first column by 0 places
Shift the second column by 1 place
Shift the third colum by 2 places
etc...

So this array:

array([[11, 12, 13],
       [17, 18, 19],
       [35, 36, 37]])

would turn into either this array:

array([[11, 36, 19],
       [17, 12, 37],
       [35, 18, 13]])

or something like this array:

array([[11,  0,  0],
       [17, 12,  0],
       [35, 18, 13]])

depending on how we handle the edges. I'm not too particular about edge behavior.

Here's my attempt at a function that does this:

import numpy

def shear(a, strength=1, shift_axis=0, increase_axis=1, edges='clip'):
    strength = int(strength)
    shift_axis = int(shift_axis)
    increase_axis = int(increase_axis)
    if shift_axis == increase_axis:
        raise UserWarning("Shear can't shift in the direction it increases")
    temp = numpy.zeros(a.shape, dtype=int)
    indices = []
    for d, num in enumerate(a.shape):
        coords = numpy.arange(num)
        shape = [1] * len(a.shape)
        shape[d] = num
        coords = coords.reshape(shape) + temp
        indices.append(coords)
    indices[shift_axis] -= strength * indices[increase_axis]
    if edges == 'clip':
        indices[shift_axis][indices[shift_axis] < 0] = -1
        indices[shift_axis][indices[shift_axis] >= a.shape[shift_axis]] = -1
        res = a[indices]
        res[indices[shift_axis] == -1] = 0
    elif edges == 'roll':
        indices[shift_axis] %= a.shape[shift_axis]
        res = a[indices]
    return res

if __name__ == '__main__':
    a = numpy.random.random((3,4))
    print a
    print shear(a)

It seems to work. Please tell me if it doesn't!

It also seems clunky and inelegant. Am I overlooking a builtin numpy/scipy function that does this? Is there a cleaner/better/more efficient way to do this in numpy? Am I reinventing the wheel?

EDIT:
Bonus points if this works on an N-dimensional array, instead of just the 2D case.

This function will be at the very center of a loop I'll repeat many times in our data processing, so I suspect it's actually worth optimizing.

SECOND EDIT: I finally did some benchmarking. It looks like numpy.roll is the way to go, despite the loop. Thanks, tom10 and Sven Marnach!

Benchmarking code: (run on Windows, don't use time.clock on Linux I think)

import time, numpy

def shear_1(a, strength=1, shift_axis=0, increase_axis=1, edges='roll'):
    strength = int(strength)
    shift_axis = int(shift_axis)
    increase_axis = int(increase_axis)
    if shift_axis == increase_axis:
        raise UserWarning("Shear can't shift in the direction it increases")
    temp = numpy.zeros(a.shape, dtype=int)
    indices = []
    for d, num in enumerate(a.shape):
        coords = numpy.arange(num)
        shape = [1] * len(a.shape)
        shape[d] = num
        coords = coords.reshape(shape) + temp
        indices.append(coords)
    indices[shift_axis] -= strength * indices[increase_axis]
    if edges == 'clip':
        indices[shift_axis][indices[shift_axis] < 0] = -1
        indices[shift_axis][indices[shift_axis] >= a.shape[shift_axis]] = -1
        res = a[indices]
        res[indices[shift_axis] == -1] = 0
    elif edges == 'roll':
        indices[shift_axis] %= a.shape[shift_axis]
        res = a[indices]
    return res

def shear_2(a, strength=1, shift_axis=0, increase_axis=1, edges='roll'):
    indices = numpy.indices(a.shape)
    indices[shift_axis] -= strength * indices[increase_axis]
    indices[shift_axis] %= a.shape[shift_axis]
    res = a[tuple(indices)]
    if edges == 'clip':
        res[indices[shift_axis] < 0] = 0
        res[indices[shift_axis] >= a.shape[shift_axis]] = 0
    return res

def shear_3(a, strength=1, shift_axis=0, increase_axis=1):
    if shift_axis > increase_axis:
        shift_axis -= 1
    res = numpy.empty_like(a)
    index = numpy.index_exp[:] * increase_axis
    roll = numpy.roll
    for i in range(0, a.shape[increase_axis]):
        index_i = index + (i,)
        res[index_i] = roll(a[index_i], i * strength, shift_axis)
    return res

numpy.random.seed(0)
for a in (
    numpy.random.random((3, 3, 3, 3)),
    numpy.random.random((50, 50, 50, 50)),
    numpy.random.random((300, 300, 10, 10)),
    ):
    print 'Array dimensions:', a.shape
    for sa, ia in ((0, 1), (1, 0), (2, 3), (0, 3)):
        print 'Shift axis:', sa
        print 'Increase axis:', ia
        ref = shear_1(a, shift_axis=sa, increase_axis=ia)
        for shear, label in ((shear_1, '1'), (shear_2, '2'), (shear_3, '3')):
            start = time.clock()
            b = shear(a, shift_axis=sa, increase_axis=ia)
            end = time.clock()
            print label + ': %0.6f seconds'%(end-start)
            if (b - ref).max() > 1e-9:
                print "Something's wrong."
        print
alain.janinm
  • 19,951
  • 10
  • 65
  • 112
Andrew
  • 2,842
  • 5
  • 31
  • 49

5 Answers5

8

This can be done using a trick described in this answer by Joe Kington:

from numpy.lib.stride_tricks import as_strided
a = numpy.array([[11, 12, 13],
                 [17, 18, 19],
                 [35, 36, 37]])
shift_axis = 0
increase_axis = 1
b = numpy.vstack((a, a))
strides = list(b.strides)
strides[increase_axis] -= strides[shift_axis]
strides = (b.strides[0], b.strides[1] - b.strides[0])
as_strided(b, shape=b.shape, strides=strides)[a.shape[0]:]
# array([[11, 36, 19],
#        [17, 12, 37],
#        [35, 18, 13]])

To get "clip" instead of "roll", use

b = numpy.vstack((numpy.zeros(a.shape, int), a))

This is probably the most efficient way of doing it, since it does not use any Python loop at all.

Community
  • 1
  • 1
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Very nice! I'll try this out. Not to push my luck, but is this easy to extend to an N-dimensional array? I'll edit my question, too. – Andrew Feb 15 '11 at 00:21
  • 1
    @Andrew: edited my answer to include `shift_axis` and `increase_axis`. Hope I got it right... – Sven Marnach Feb 15 '11 at 00:33
  • @Andrew: My bad -- the `vstack()` call needs to be replaced by a `concatenate()` along the correct axis -- unfortunately I have no time to look into this at the moment... – Sven Marnach Feb 15 '11 at 12:21
  • I'll keep poking at it. I really like the as_strided approach, I'm just having trouble with the edge conditions. The buffer works as long as the 'shear strength' is 1, but for larger shears I think I end up stepping into bad places in memory. – Andrew Feb 15 '11 at 15:16
  • @Andrew: I missed the meaning of `strength` so far -- specifically that you might want to shift by more than one per column. For `strength = 2`, you would actually need `vstack((a, a, a))`, and for higher values of `strength` even more copies of `a`, so the above answer might not be a good idea at all. tom10's approach can be modified to work in any dimension as well, and you will still have a single Python loop only -- maybe this is the way to go. I'll detail this (and an approach based on advanced indexing) in another answer. – Sven Marnach Feb 15 '11 at 15:52
  • Thanks, looking forward to it! – Andrew Feb 15 '11 at 16:21
  • 1
    @Andrew: Added two more answers. If you do any benchmarking, please let me know the results! – Sven Marnach Feb 15 '11 at 16:45
8

numpy roll does this. For example, if you original array is x then

for i in range(x.shape[1]):
    x[:,i] = np.roll(x[:,i], i)

produces

[[11 36 19]
 [17 12 37]
 [35 18 13]]
tom10
  • 67,082
  • 10
  • 127
  • 137
  • Yeah, roll is good stuff; this is certainly elegant. I suspect the loop will be slower than Sven's suggestion, but I shouldn't jump to conclusions without benchmarking. I'd have to think about how to extend this to the N-dimensional case. – Andrew Feb 15 '11 at 00:24
  • @Andrew: Note that this solution uses less memory than mine. – Sven Marnach Feb 15 '11 at 00:32
8

The approach in tom10's answer can be extended to arbitrary dimensions:

def shear3(a, strength=1, shift_axis=0, increase_axis=1):
    if shift_axis > increase_axis:
        shift_axis -= 1
    res = numpy.empty_like(a)
    index = numpy.index_exp[:] * increase_axis
    roll = numpy.roll
    for i in range(0, a.shape[increase_axis]):
        index_i = index + (i,)
        res[index_i] = roll(a[index_i], -i * strength, shift_axis)
    return res
Community
  • 1
  • 1
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
2

Here is a cleaned-up version of your own approach:

def shear2(a, strength=1, shift_axis=0, increase_axis=1, edges='clip'):
    indices = numpy.indices(a.shape)
    indices[shift_axis] -= strength * indices[increase_axis]
    indices[shift_axis] %= a.shape[shift_axis]
    res = a[tuple(indices)]
    if edges == 'clip':
        res[indices[shift_axis] < 0] = 0
        res[indices[shift_axis] >= a.shape[shift_axis]] = 0
    return res

The main difference is that it uses numpy.indices() instead of rolling your own version of this.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
0
r = lambda l, n: l[n:]+l[:n]

transpose(map(r, transpose(a), range(0, len(a)))

I think. You should probably consider this psuedocode more than actual Python. Basically transpose the array, map a general rotate function over it to do the rotation, then transpose it back.

Anon.
  • 58,739
  • 8
  • 81
  • 86