2

I have a list of "TIFFFiles", where each "TIFFFiles" contains a "TIFFArray" with 60 tiff images each with a size of 2776x2080 pixel. The images are read as numpy.memmap objects. I want to access all intensities of the images (shape of imgs: (60,2776,2080)). I use the following code:

for i in xrange(18):

    #get instance of type TIFFArray from tiff_list
    tiffs = get_tiff_arrays(smp_ppx, type_subfile,tiff_list[i])

    #accessing all intensities from tiffs
    imgs = tiffs[:,:,:]

Even by overwriting "tiffs" and "imgs" in each iteration step my memory increments by 2.6GByte. How can I avoid that the data are copied in each iteration step? Is there any way that the memory of 2.6GByte can be reused?

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • `imgs = tiffs[:,:,:]` is not valid Python, is it? – Tim Pietzcker Feb 05 '13 at 08:20
  • @TimPietzcker if it's a 3D array, it's valid slicing. That copies the content of `tiffs` into `imgs`, instead of just the reference as would be with `imgs = tiffs` – Francesco Montesano Feb 05 '13 at 08:43
  • If you do `del tiffs` and `del imgs` at the end of each iteration, does the memory usage improve? – Francesco Montesano Feb 05 '13 at 08:45
  • @FrancescoMontesano No I tried several things of this kind, e.g. del statements, but the memory usage doesn't decrease at all. – user2042189 Feb 05 '13 at 09:06
  • @FrancescoMontesano: In standard Python, or with an extension like NumPy? I've never seen this notation before. – Tim Pietzcker Feb 05 '13 at 09:16
  • @TimPietzcker I guess anything that support multidimensional lists/arrays have this syntax, so it works in Numpy. Standard Python has lists and tuples that are 1D containers of other objects, so you can only slice them as `[:]` – Francesco Montesano Feb 05 '13 at 09:43
  • @FrancescoMontesano: I've tagged the question accordingly. Otherwise, the numpy experts wouldn't know they are needed here :) – Tim Pietzcker Feb 05 '13 at 10:23
  • @TimPietzcker: thanks. I didn't think about doing it and besides I don't have the right to do it – Francesco Montesano Feb 05 '13 at 10:26
  • @user2042189 you better substitute `imgs = tiffs[:,:,:]` with `imgs = np.copy(tiffs)` if you want to copy the content and not the reference of `tiffs` to `imgs`. See [this](http://stackoverflow.com/questions/4555431/bug-or-feature-cloning-a-numpy-array-w-slicing) – Francesco Montesano Feb 05 '13 at 12:57
  • @FrancescoMontesano Thank you I tried this out and my memory usage decreased for every loop by 600MB. However the memory still increments 2GByte in every loop... – user2042189 Feb 06 '13 at 09:13
  • @user2042189: I have the feeling that numpy has some big problems in handling and cleaning memory when reading files. I don't know if it's related to the underlying implementation or some misunderstanding with the garbage collector – Francesco Montesano Feb 06 '13 at 17:02

1 Answers1

0

I know that is probably not an answer, but it might help anyway and was too long for a comment.

Some times ago I had a memory problem while reading large (>1Gb) ascii files with numpy: basically to read the file with numpy.loadtxt, the code was using the whole memory (8Gb) plus some swap.

From what I've understood, if you know in advance the size of the array to fill, you can allocate it and pass it to, e.g., loadtxt. This should prevent numpy to allocate temporary objects and it might be better memory-wise.

mmap, or similar approaches, can help improving memory usage, but I've never used them.

edit

The problem with memory usage and release made me wonder when I was trying to solve my large file problem. Basically I had

def read_f(fname):
    arr = np.loadtxt(fname)  #this uses a lot of memory
    #do operations
    return something  
for f in ["verylargefile", "smallerfile", "evensmallerfile"]:
    result = read_f(f)

From the memory profiling I did, there was no memory release when returning loadtxt nor when returning read_f and calling it again with a smaller file.

Francesco Montesano
  • 8,485
  • 2
  • 40
  • 64