2

For image processing, I need an array (array type, not numpy array) for 2 million 32 bit words. If I use something like:

tb = array.array( 'i', ,(0,)*2000000)

it requires 126 msec. It's large and I don't even need to initialize the array. I don't know Python internals but I assume that the statement generate tons on malloc() (memory allocator) and free() (memory deallocator).

Is there another way to create a very large Python array?

user3435121
  • 633
  • 4
  • 13
  • Take a look at [`scipy.sparse`](https://docs.scipy.org/doc/scipy/reference/sparse.html) – Barmar Nov 19 '21 at 21:19
  • 1
    Why wouldn't a NumPy array work for your image processing? Using NumPy for images is certainly something that's been done: https://scikit-image.org/docs/dev/user_guide/numpy_images.html – jjramsey Nov 19 '21 at 21:23
  • 2
    Is there a reason you wouldn't use numpy? – Grismar Nov 19 '21 at 21:26
  • Does this answer your question? [Efficient Python array with 100 million zeros?](https://stackoverflow.com/questions/2214651/efficient-python-array-with-100-million-zeros) – Grismar Nov 19 '21 at 21:26
  • 1
    @pts Thanks for intelligent solution. It's one hundred time faster (approx 2sec). – user3435121 Nov 20 '21 at 21:55
  • 1
    But bytearray( 8_000_000) requires only 1msec, same for bytes(8_000_000). That's a good argument to ask for a size argument when we create an array.array(). Thanks everybody. – user3435121 Nov 20 '21 at 22:04
  • 1
    @pts Thanks for your intelligent solution. It's one hundred time faster (approx 2msec). – user3435121 Nov 20 '21 at 22:09
  • @Grismar Hmm... if I'm not mistaken, *none* of the ten answers there gets it right (i.e., like pts did). The accepted and high-voted answer even apparently finds it "counterintuitive" that `array.array('L', [0] * 20000000)` takes longer than `[0] * 20000000` alone. – Kelly Bundy Nov 21 '21 at 20:03
  • @KellyBundy - I noticed that the top answer isn't the best, but from a StackOverflow point of view, it would still make sense to post a better answer there and close this question than create a new set of answers here. After all, the other question will see more traffic from people actually using the search. – Grismar Nov 22 '21 at 02:14

2 Answers2

2

This is much faster, because it doesn't create a long, temporary tuple:

tb = array.array('i', (0,)) * 2000000
pts
  • 80,836
  • 20
  • 110
  • 183
1

This does the same thing, but "should" run at least 10 times faster, by avoiding the needless expense of creating, and crawling over, a multi-million element tuple of unbounded Python ints:

>>> tb = array.array('i')
>>> tb.frombytes(b'\0' * 8_000_000)
>>> len(tb)
2000000
>>> all(i == 0 for i in tb)
True

Note: I'm assuming you're running on a platform where the array typecode i denotes a 4-byte integer type (that's why I changed your 2 million to 8 million). That's very likely, but if you're not sure of that, then slightly fancier code is needed:

>>> tb = array('i')
>>> tb.frombytes(b'\0' * (2_000_000 * tb.itemsize))

Of course tb.itemsize there returns 4 on my box.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132