0

Let's say, that we have a numpy array storing large objects. My goal is to delete one of these objects from memory, but retain the initial structure of the array. The cell, under which this object was stored might be filled for example with None.

Example simplified behaviour, where I replaced large objects with characters:

arr = numpy.asarray(['a', 'b', 'c']) # arr = ['a', 'b', 'c']
delete_in_place(arr, 0)              # arr = [None, 'b', 'c']

I can't do this by calling numpy.delete(), because it will just return a new array without one element, which will take additional space in memory. This will also change the shape (by getting rid of given index), which I want to avoid.

My other idea was to just set arr[0] = None and call the garbage collector, but I'm not sure what the exact behaviour of such procedure would be.

Do you have any ideas on how to do it?

brzepkowski
  • 265
  • 1
  • 14
  • Whats your end objective? _"premature optimization is the root of all evil"_ – tijko Oct 29 '22 at 15:32
  • What do you mean by large objects? In your example, the elements are strings, and the resulting `dtype` will be 'U1'. If you try that `None` assignment you'll get 'N' in that cell. If it is an `object` dtype array, then the `None` can replace the referenced objects. If those objects are not longer referenced, then yes, they will be garbage. – hpaulj Oct 29 '22 at 15:47
  • @tijko In general my array is a 2D one. I'm combining objects stored at different indices, and sometimes I want to store the result of such "combination" in place of one of the old objects, while the second one might be removed from the memory. I want to retain the structure of the array, so that I can properly handle the indices. – brzepkowski Oct 29 '22 at 15:48
  • @hpaulj My objects can take several GB of memory and I just replaced them with simple strings for the purpose of the example. – brzepkowski Oct 29 '22 at 15:49
  • 1
    As I pointed out, simple strings are stored in arrays differently. You don't need to give us GB objects, but you still need to capture the core of the issue in your example(s). – hpaulj Oct 29 '22 at 15:50
  • Object dtype arrays are nearly the same as lists - containing references to objects stored elsewhere in memory. They lack list methods like `append`, but provide basic array operations like `reshape` and 2d indexing. Use them with caution. – hpaulj Oct 29 '22 at 16:32

2 Answers2

4

When you create a numpy array, it has a fixed size. Eventually, when you try to delete an element it will create a new numpy array.

The way you are trying to do it, that's not an effective way. Please try another library.

Muntasir Aonik
  • 1,800
  • 1
  • 9
  • 22
0

You can do this with a multi-dimensional array and not even get pandas or numpy involved. You will need the assistance of the gc module and builtin del command but thats the extent of things.

For example:

import gc

with open('large-dataset.txt') as fh:
    raw_data = fh.readlines()

# parse or object creation what-not
large_objs_multidim = [obj_create(i) for i in raw_data]
...
# No longer need a reference to large object
temp_obj = large_objs_multidim[0][0]
large_objs_multidim[0][0] = None
del temp_obj
# Python doesn't make guarantees about collection read up on ref-counts.
gc.collect()

This gives the general idea on how you need to invoke the garbage collector yourself. There are some nuisances to Python's reference counting and objects in memory. I don't know the intricacies to your project and code but you might benefit from reading into __weakref__ too...

Also these link for further reading:

https://stackoverflow.com/a/1316793/1230086

https://stackoverflow.com/a/9908216/1230086

https://docs.python.org/3/library/gc.html

tijko
  • 7,599
  • 11
  • 44
  • 64