Python : How to vectorize my split function with Numpy

Question

I found on Stack a function who solve my problem but right now I would like to speed up my code because I have a lot of list to split.

I heard that vectorize a function ca be a solution so tried to vectorize my function with numpy but it doesn't works.

Can you help my please ?

The original function :

seq = ([1,1,5,1,5,5,1,5,1,1])

def zigzag(seq):
  return seq[::2], seq[1::2]

result :

([1, 5, 5, 1, 1], [1, 1, 5, 5, 1])

My vectorize attempt :

import numpy as np
seq = ([1, 1, 5, 1, 5, 5, 1, 5, 1, 1], [2, 2, 2, 3, 3, 3, 3, 2, 2, 2], [6, 3, 9, 2, 9, 4, 6, 3])

def zigzag(seq):
  return seq[::2], seq[1::2]

vecto = np.vectorize(zigzag)

vecto(seq)

Desired result :

(([1, 5, 5, 1, 1], [1, 1, 5, 5, 1]), ([2, 2, 3, 3, 2], [2, 3, 3, 2, 2]), ([6, 9, 9, 6], [3, 2, 4, 3]))

you wont get much speedup from just np.vectorize, basically it just calls the function in a for loop behind the scenes. — reptilicus, May 23 '15 at 13:27
Not really. If all your sequences were the same length, you could create a 2D numpy array where every row is a sequence, and then using vectorized index as: `return seq[:, ::2], seq[:, 1::2]`... but, if you have variable length sequences, you need to iterate through them. The only speed up you might get is if you write your code in Cython (but as you work with python objects, lists/tuples, I doubt you could get any relevant speedup). — Imanol Luengo, May 23 '15 at 15:11

Martin Ueding · Accepted Answer · 2015-05-24T11:57:58.457

That is easy for the single array: Just make your sequence a numpy.array and your zigzag function will call C code below the hood.

def zigzag(seq):
  return seq[::2], seq[1::2]

seq = np.array([1,1,5,1,5,5,1,5,1,1])

result = zigzag(seq)
print(result)

Result:

(array([1, 5, 5, 1, 1]), array([1, 1, 5, 5, 1]))

For the multi dimensional case you have the problem that the length of your lists is not the same. Therefore you cannot make a nice numpy.array out of that. I suggest that you adapt it like so:

import numpy as np

def zigzag(seq):
    try:
        if len(seq.shape) == 1:
            return seq[::2], seq[1::2]
    except AttributeError:
        return [zigzag(x) for x in seq]

def main():
    options = _parse_args()

    seq = np.array([1,1,5,1,5,5,1,5,1,1])
    seq2 = (
        np.array([1, 1, 5, 1, 5, 5, 1, 5, 1, 1]),
        np.array([2, 2, 2, 3, 3, 3, 3, 2, 2, 2]),
        np.array([6, 3, 9, 2, 9, 4, 6, 3]),
    )

    print(zigzag(seq))
    print()
    print(zigzag(seq2))

The second sequence is just a tuple of numpy.array. The function checks whether your sequence has a shape attribute, a good indicator that it is a numpy.array. If so, it uses the NumPy slicing. In case it is a tuple, it will just call the zigzag function for each element.

It produces the desired output for your examples:

(array([1, 5, 5, 1, 1]), array([1, 1, 5, 5, 1]))

[(array([1, 5, 5, 1, 1]), array([1, 1, 5, 5, 1])), (array([2, 2, 3, 3, 2]), array([2, 3, 3, 2, 2])), (array([6, 9, 9, 6]), array([3, 2, 4, 3]))]

This is a not a polished solution, however. You do not want to convert your Python lists and tuples to NumPy arrays all the time. As @hpaulj has pointed out in the comments, this conversion takes way longer than the splitting of the Python list or tuple itself. Think about where you have your data and where it makes sense to have them in NumPy arrays. Those have to have a rectangular shape. Once you have this, you can write a version of zigzag that is appropriate.

Watch the time it takes to convert a list into an array. For a 1000 item list, the split takes 4us, spliting a same size array takes 2; but converting the list to array takes 100+us. — hpaulj, May 23 '15 at 16:26
Indeed, I negleted this. If the OP was able to use `numpy.array` throughout the programs, the conversion would be rare enough for this to to speed things up. The conversion to `numpy.array` might be so slow since it has to convert every element from the sequence into a number, where the native Python list slicing does not care for the type of the elements. I think a statically typed language would help with this particular problem. — Martin Ueding, May 24 '15 at 11:56

Python : How to vectorize my split function with Numpy

1 Answers1