0

Is it possible to trim zero 'records' of a structured numpy array without copying it; i.e. free allocated memory for the 'unused' zero entries at the beginning or the end; actually, I am only interested in trimming zeros at the end.

There is a builtin function numpy.trim_zeros() for 1d arrays. Its return value:

Returns:

trimmed : 1-D array or sequence

The result of trimming the input. The input data type is preserved.

However, I can't say from this whether this does not create a copy and only frees memory. I am not proficient enough to tell from its source code its behaviour.

More specifically, I have following code:

import numpy
edges = numpy.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
# fill the first two records with sensible data:
edges[0]['i'] = 0
edges[0]['j'] = 1
edges[0]['length'] = 2.0
edges[1]['i'] = 1
edges[1]['j'] = 2
edges[1]['length'] = 2.0
# list memory adress and size
edges.__array_interface__
edges = numpy.trim_zeros(edges)  # does not work for structured array
edges.__array_interface__

UPDATE

My question is somewhat 'twofold':

1) Does the builtin function simply frees memory or does it copy the array?

Answer: it copies creates a slice (=view); [ipython console] import numpy; numpy?? (see also Resize NumPy array to smaller size without copy and View onto a numpy array?)

2) What be a solution to have similar functionality for structured arrays?

Answer:

begin=(edges!=numpy.zeros(1,edges.dtype)).argmax()
end=len(edges)-(edges!=numpy.zeros(1,edges.dtype))[::-1].argmax()
# 1) create slice without copy but no memory is free
goodedges=edges[begin:end]
# 2) or copy and free memory (temporary both arrays exist)
goodedges=edges[begin:end].copy()
del edges
Community
  • 1
  • 1
Hotschke
  • 9,402
  • 6
  • 46
  • 53
  • What dose this function do with a regular 1d arrays? Does it copy? Return a view? Try both `fb`. I can't imagine it actually changing the buffer size. Canyou read its code? – hpaulj Jan 27 '16 at 12:37

2 Answers2

1

IMHO, there is two problem.

  • First, the trim_zeros function doesn't recognize zeroes on composite dtype.

You can locate them by begin=(edges!=zeros(1,edges.dtype)).argmax() and end=len(edges)-(edges!=zeros(1,edges.dtype))[::-1].argmax(). Then goodedges=edges[begin:end] is the interresting data.

  • Second, the trim_zeros function doesn't free memory:

Returns ------- trimmed : 1-D array or sequence. The result of trimming the input. The input data type is preserved.

So I think you must do it manually : goodedges=edges[begin:end].copy();del edges.

B. M.
  • 18,243
  • 2
  • 35
  • 54
  • thanks for your answer. your first bullet, resolves one part of my problem. However, the second bullet is actually what I wanted to avoid: copying and freeing. Do you think `numpy.resize()` could be of any help? – Hotschke Jan 27 '16 at 14:12
  • No, `resize` make copies. IMHO, Your only chance is to collect data by chunks, and store only the valid one. – B. M. Jan 27 '16 at 14:41
1

To expand on my comment, let's try trim_zeros on a simple integer array:

In [252]: arr = np.zeros(10,int)
In [253]: arr[3:8]=np.ones(5)
In [254]: arr
Out[254]: array([0, 0, 0, 1, 1, 1, 1, 1, 0, 0])
In [255]: arr1=np.trim_zeros(arr)
In [256]: arr1
Out[256]: array([1, 1, 1, 1, 1])

Now compare the __array_interface__ dictionaries:

In [257]: arr.__array_interface__
Out[257]: 
{'descr': [('', '<i4')],
 'shape': (10,),
 'version': 3,
 'strides': None,
 'data': (150760432, False),
 'typestr': '<i4'}

In [258]: arr1.__array_interface__
Out[258]: 
{'descr': [('', '<i4')],
 'shape': (5,),
 'version': 3,
 'strides': None,
 'data': (150760444, False),
 'typestr': '<i4'}

shape reflects the change we want. But look at the data pointer, ...432, and ...444. arr1 just points to 12 bytes (3 ints) further along the same buffer.

If I delete arr or reassign it (even arr=arr1), arr1 continues to point to this data buffer. numpy keeps some sort of reference count, and recycles a data buffer only when all references are gone.

The code for trim_zeros is (fetched in ipython with '??')

File:        /usr/lib/python3/dist-packages/numpy/lib/function_base.py
def trim_zeros(filt, trim='fb'):
    first = 0
    trim = trim.upper()
    if 'F' in trim:
        for i in filt:
            if i != 0.: break
            else: first = first + 1
    last = len(filt)
    if 'B' in trim:
        for i in filt[::-1]:
            if i != 0.: break
            else: last = last - 1
    return filt[first:last]

The work is in the last line, and clearly returns a slice, a view. Most of the code handles the 2 trim options (F and B). Notice that it uses iteration to find the first and last non-zeros. That should be fine for arrays with just a few extra 0s at beginning or end. But it isn't the 'vectorized' kind of operation that SO questions often seek.

Before this question I didn't even know that trim_zeros existed, but I'm not at all surprised by its code and action.

On a side issue, here's a more compact way of creating your edges array.

In [259]: edges =np.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
In [260]: edges[:2]=[(0,1,2.0),(1,2,2.0)]

To remove all the zero elements you could just use:

edges[edges!=numpy.zeros(1,edges.dtype)]

This is a copy. It does remove 'embedded' zeros as well, but that might not be an issue if the only zeros are those left at the end after filling in the earlier slots.

You may not need this trimming at all if you collect the edges data in a list, and build the array at the end:

edges1 = np.array([(0,1,2.0),(1,2,2.0)], dtype=edges.dtype)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • @hapulj: thanks for your answer. I didn't know about `??`. Very helpful. And in this case the builtin function is actually not that hard to grasp. So reading the source code is something I will do in the future before posting on stackoverflow. I am currently not short on memory, but there is one thing I do not like too much about python that I often do not know what is going on behind the curtain. If someone is interested in scientific computing, being aware of memory usage is quite important. Copying stuff unintentionally is something AFAIK which can easily happen in python. – Hotschke Jan 29 '16 at 08:39