1

I have a 3000x6000 2D grid (from a tiff image). I want to regrid it into a lower resolution grid using griddata method from scipy.interpolate library. First, I need to form a 18000000x2 numpy array as the input for griddata based on what I read here. Here's what I do:

import numpy as np
from scipy.interpolate import griddata
x_length = 6000
y_length = 3000
def func(x, y):
    return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
grid_x, grid_y = np.meshgrid(np.linspace(0,1,x_length),np.linspace(0,1,y_length))
points = np.random.rand(x_length*y_length, 2)
values = func(points[:,0], points[:,1])
grid_z0 = griddata(points, values, (grid_x, grid_y), method='nearest')

I get a MemoryError when doing griddata. I have 8 gb of RAM and I shouldn't get this error based on the first answer to this question.

Overall, regriding a 3000x6000 grid into a lower resolution grid shouldn't be that hard and I guess I am doing something funny here. Should I get e MemoryError doing these liens of codes with 8 gb RAM?

P.S: Although I have a 64-bit operating system (Windows 7), I use the following Python version:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Community
  • 1
  • 1
ahoosh
  • 1,340
  • 3
  • 17
  • 31
  • 1
    It works for me. I have 4 GB of RAM and more than 3 are already occupied. In fact, in htop I only see around 200 MB, and each one of your arrays is exactly 288 MB. – Davidmh Aug 06 '14 at 23:13
  • 1
    Also, use `x = np.round(x, 5)` to round to the first 5 decimals. – Davidmh Aug 06 '14 at 23:15
  • @Davidmh I edited the question a fair amount. I am still getting a `MemoryError` running the code. Do you get an error? – ahoosh Aug 07 '14 at 00:31
  • No, still don't get it. The usage is about 2GB. – Davidmh Aug 07 '14 at 00:39
  • Note that a 32-bits process can only access 4 GB. – Davidmh Aug 07 '14 at 00:40
  • `print (points.nbytes + values.nbytes + grid_x.nbytes + grid_y.nbytes) * 1e-6` reports 720 MB, and the final output are 144 MB more. Do you have enough free RAM? – Davidmh Aug 07 '14 at 00:44
  • 2
    This should give you the size of all the arrays accesible by Python: `1e-6 * sum(x.nbytes for x in globals().values() if isinstance(x, np.ndarray))`. As expected, I get the same result as before, 720 MB. – Davidmh Aug 07 '14 at 00:50
  • @Davidmh I get `720` MB for both of the commands that you recommended. As it turns out, reading the answer, I am not doing a very efficient thing hear in resampling the image, and at the same time, I am technically using just 2 GB of RAM since I have Python win32 on a 64 Windows 7. – ahoosh Aug 07 '14 at 14:45

1 Answers1

3

As the comments point out, you run out of memory. A 32-bit Python running on 64-bit Windows 7 is limited to 2 GB of memory, and you just banged into that.

There are three solutions:

  1. Get a 64-bit Python. (suggested)
  2. Interpolate in several chunks (split the image into some suitable overlapping parts) (laborious)
  3. Rethink your interpolation method (recommended)

If you have a regular grid (as in the case of an image), using griddata to regrid it into another regular grid is quite wasteful in terms of memory and time.

There are several methods to downsample the image. At least PIL and cv2 modules offer downsampling functions. If you want to use a SciPy method, have a look at scipy.ndimage.zoom. It will allow you to resample the image from one regular grid to another.

DrV
  • 22,637
  • 7
  • 60
  • 72
  • I will definitely change my Python to `64-bit`. It was useful to know I am only using 2 GB of my RAM! If I already have a 2D numpy array (2D grid) and don't have access to the tiff file anymore, what is the most efficient way to do it? Do you think using `RegularGridInterpolator` from `scipy.interpolate` would be a good option? – ahoosh Aug 07 '14 at 15:02
  • If you have the image data in a 2D array, then the easiest way is to use something like `scipy.ndimage.zoom(my_img_array, 0.43)` to scale both directions by 0.43 (or whatever you want). – DrV Aug 07 '14 at 19:45