5

I am using Numpy version 1.11.1 and have to deal with an two-dimensional array of

my_arr.shape = (25000, 25000)

All values are integer, and I need a unique list of the arrays values. When using lst = np.unique(my_arr) I am getting:

Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    palette = np.unique(arr)
  File "c:\Python27\lib\site-packages\numpy\lib\arraysetops.py", line 176, in unique
    ar = np.asanyarray(ar).flatten()
MemoryError

My machine has only 8 GB RAM, but I tried it with another machine with 16 GB RAM, and the result is the same. Monitoring the memory and CPU usage doesn't show that the problems are related to RAM or CPU.

In principle, I know the values the array consists of, but what if the input changes... Also, if I want to replace values of the array by another (let's say all 2 by 0), will it need a lot of RAM as well?

fzzle
  • 1,466
  • 5
  • 23
  • 28
TomGeo
  • 1,213
  • 2
  • 12
  • 24

1 Answers1

0

Python 32-bit can't access more than 4 GiB RAM (often ~2.5 GiB). The obvious answer would be to use the 64-bit version. If that doesn't work, another solution would be to use numpy.memmap and memory-map the array into a file stored on disk.

smerlin
  • 6,446
  • 3
  • 35
  • 58
fzzle
  • 1,466
  • 5
  • 23
  • 28
  • 2
    I still got a `MemoryError` with 64-bit and using `numpy.memmap`. The array is `float`s with shape `(8465, 103114)`, only slightly bigger than the OP's array. (It comes from a geographic raster.) – jpmc26 May 12 '18 at 06:26