I have a long list of xy coordinates, and would like to convert it into numpy array.
>>> import numpy as np
>>> xy = np.random.rand(1000000, 2).tolist()
The obvious way would be:
>>> a = np.array(xy) # Very slow...
However, the above code is unreasonably slow. Interestingly, to transpose the long list first, convert it into numpy array, and then transpose back would be much faster (20x on my laptop).
>>> def longlist2array(longlist):
... wide = [[row[c] for row in longlist] for c in range(len(longlist[0]))]
... return np.array(wide).T
>>> a = longlist2array(xy) # 20x faster!
Is this a bug of numpy?
EDIT:
This is a list of points (with xy coordinates) generated on-the-fly, so instead of preallocating an array and enlarging it when necessary, or maintaining two 1D lists for x and y, I think current representation is most natural.
Why is looping through 2nd index faster than 1st index, given that we are iterating through a python list in both directions?
EDIT 2:
Based on @tiago's answer and this question, I found the following code twice as fast as my original version:
>>> from itertools import chain
>>> def longlist2array(longlist):
... flat = np.fromiter(chain.from_iterable(longlist), np.array(longlist[0][0]).dtype, -1) # Without intermediate list:)
... return flat.reshape((len(longlist), -1))