27

I have a list of several hundred 10x10 arrays that I want to stack together into a single Nx10x10 array. At first I tried a simple

newarray = np.array(mylist)

But that returned with "ValueError: setting an array element with a sequence."

Then I found the online documentation for dstack(), which looked perfect: "...This is a simple way to stack 2D arrays (images) into a single 3D array for processing." Which is exactly what I'm trying to do. However,

newarray = np.dstack(mylist)

tells me "ValueError: array dimensions must agree except for d_0", which is odd because all my arrays are 10x10. I thought maybe the problem was that dstack() expects a tuple instead of a list, but

newarray = np.dstack(tuple(mylist))

produced the same result.

At this point I've spent about two hours searching here and elsewhere to find out what I'm doing wrong and/or how to go about this correctly. I've even tried converting my list of arrays into a list of lists of lists and then back into a 3D array, but that didn't work either (I ended up with lists of lists of arrays, followed by the "setting array element as sequence" error again).

Any help would be appreciated.

James
  • 271
  • 1
  • 3
  • 3
  • 1
    What do you get when you do something like `[item.shape for item in mylist if item.shape != (10, 10)]`? (i.e. are you _really_ sure that all of the arrays have the same shape?) – Joe Kington Dec 03 '10 at 05:13
  • 2
    dstack where have you been all my life.. i have been using hstack and vstack with [:,:,newaxis] rubbish – wim May 29 '11 at 03:32

1 Answers1

45
newarray = np.dstack(mylist)

should work. For example:

import numpy as np

# Here is a list of five 10x10 arrays:
x = [np.random.random((10,10)) for _ in range(5)]

y = np.dstack(x)
print(y.shape)
# (10, 10, 5)

# To get the shape to be Nx10x10, you could  use rollaxis:
y = np.rollaxis(y,-1)
print(y.shape)
# (5, 10, 10)

np.dstack returns a new array. Thus, using np.dstack requires as much additional memory as the input arrays. If you are tight on memory, an alternative to np.dstack which requires less memory is to allocate space for the final array first, and then pour the input arrays into it one at a time. For example, if you had 58 arrays of shape (159459, 2380), then you could use

y = np.empty((159459, 2380, 58))
for i in range(58):
    # instantiate the input arrays one at a time
    x = np.random.random((159459, 2380))
    # copy x into y
    y[..., i] = x
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    *facepalm* Turns out out of my images was only 10x8, so it was purely my own inattentiveness. I hadn't quite thought about the rollaxis part though, that helped. Thanks for the quick response! – James Dec 03 '10 at 05:39
  • `np.dstack` if a good option but I am getting a memory error for a list of 58 2D arrays, each 2D having [159459, 2380] shape. Any tip? – seralouk Jul 31 '19 at 14:18
  • @serafeim: Buying more RAM or moving the computation to a machine with more RAM are the obvious solutions. Another possibility is to make a [NumPy memmap](https://stackoverflow.com/questions/13780907/is-it-possible-to-np-concatenate-memory-mapped-files) (file-based NumPy array) and fill it with the data from the 58 2D arrays (which can also be memmaps). File-based computation is slower than RAM-based computation, but it is a possible workaround if you can't move to a machine with more memory. – unutbu Jul 31 '19 at 15:02
  • @serafeim: Another alternative, which is preferable to memmaps when possible, is to restructure your code to process the data in chunks, thus avoiding the need to load everything into memory at once. – unutbu Jul 31 '19 at 15:05
  • I am using a linux machine with 90 cores and 500GB memory. For some reason I get the memory error sometimes. Is converting into float32 before np.dstack a good option? – seralouk Jul 31 '19 at 15:05
  • Converting to float32 would require half as much memory as using float64, but at the cost of more floating-point arithmetic error. If the cumulative amount of arithmetic error that results from your computation is acceptably low, then this might be a viable alternative. – unutbu Jul 31 '19 at 15:14
  • 58 2D float64 arrays of shape (159459, 2380) should require around 176GB. Copying these arrays to a new array of shape (159459, 2380, 58) would require another 176GB, for a total of 352GB. Not sure why you would see a memory error here if you have 500GB free (unless the OS and other applications or data structures are consuming more than 148GB). – unutbu Jul 31 '19 at 15:26
  • 1
    @serafeim: Another alternative might be allocate space for the final array, instantiate the input arrays one at a time, and pour the input arrays into the final array using a for-loop. This would only require 176GB plus space for one input array (about 3GB). Since you are not having a problem instantiating all 58 input arrays at once, you probably will have no problem doing the above. I've edited the post above to show what I mean. – unutbu Jul 31 '19 at 16:15
  • Nice suggestion. Thanks – seralouk Jul 31 '19 at 16:31
  • i have been looking for two days for this post! – msarafzadeh Mar 10 '21 at 11:23
  • 1
    @James if this answered your question maybe consider accepting it and clicking the checkmark (a decade later). – eric May 07 '21 at 17:19