1

Suppose I have a numpy array from which I want to remove a specific element.

# data = np.array([ 97  32  98  32  99  32 100  32 101])
# collect indices where the element locate 
indices = np.where(data==32)
without_32 = np.delete(data, indices)
# without_32 become [ 97  98  99 100 101]

Now, suppose I want to restore the array (As I already have the indices where I should put the value 32).

restore_data = np.insert(without_32, indices[0], 32)

But it gives IndexError: index 10 is out of bounds for axis 0 with size 9. IS there other way to implement that?

update

It seems after delete the element I need some adjust for the indices like

restore_data = np.insert(without_32, indices[0]-np.arange(len(indices[0])), 32)

But Can I generalize this? Like not only 32 but also trace 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47. I mean I want to trace the same way for 32-47 in a efficient way.

falamiw
  • 426
  • 4
  • 16
  • `np.insert(without_32, indices[0]-np.arange(len(indices[0])), 32)`, IIUC – Michael Szczesny May 06 '22 at 13:47
  • It works @MichaelSzczesny, Could you explain what `indices[0]-np.arange(len(indices[0]))` do here? Thanks – falamiw May 06 '22 at 13:54
  • This adjusts the indices to the correct insertion points for [`np.insert`](https://numpy.org/doc/stable/reference/generated/numpy.insert.html). I'm hesitant to answer this question as it is redundant since you must already have the original array. This looks like a [xy problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – Michael Szczesny May 06 '22 at 14:32
  • I suspect there are better solutions for the actual use case. As a general rule, try to avoid `np.delete` and `np.insert`. – Michael Szczesny May 06 '22 at 14:41
  • I see. Can I generalize this? Like not only `32` but also trace `33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47`. I mean I want to trace the same way for `32-47`. Can I do this in a better way. And thanks for your response. @MichaelSzczesny – falamiw May 09 '22 at 14:48

1 Answers1

1

My alternative:

#define a mask 
mask = data==32
mask #array([False,  True, False,  True, False,  True, False,  True, False])

#filter
without_32 = data[~mask]
without_32 #array([ 97,  98,  99, 100, 101])

Then if you want to have the original data:

restore_data = np.ones_like(mask, dtype=int)*32
restore_data[~mask] = without_32
restore_data

output:

array([ 97,  32,  98,  32,  99,  32, 100,  32, 101])

In practice you are generating a constant array of 32 with length equal to mask (and obviously to data) and then you are filling the position in which mask is False (the positions in which data!=32) with the without_32 array

UPDATE

In order to answer to your update:

data = np.random.randint(20, 60, size=20)
#array([47, 39, 29, 45, 21, 44, 48, 27, 21, 25, 47, 59, 58, 53, 46, 36, 34, 57, 36, 54])

mask = (data>=32)&(data<=47) #the values you want to remove

clean_array = data[~mask] #data you want to retain
removed_data = data[mask] #data you want to remove

Now you can del data, you can do whatever you want with clean_array, and when you need to reconstruct the original array you just:

restore_data = np.zeros_like(mask, dtype=int)
restore_data[~mask] = clean_array
restore_data[mask] = removed_data
#array([47, 39, 29, 45, 21, 44, 48, 27, 21, 25, 47, 59, 58, 53, 46, 36, 34, 57, 36, 54])
  • Which one is better? store the mask or storing the indices as MichaelSzczesny said in the comment? I will implement this code with a huge array (thousand of thousand length) @SalvatoreDanieleBianco. – falamiw May 06 '22 at 14:24
  • 1
    it dipends. The `bool` type is more "convenient" than the `int` type, but the array of indices is shorter than the mask. If you expect to find very less 32 use the indices; if you expect to find a lot of 32 use the mask. You can check the memory usage of an array in this way: https://stackoverflow.com/questions/11784329/python-memory-usage-of-numpy-arrays . In your example the indices array is more convenient than the mask. – Salvatore Daniele Bianco May 06 '22 at 14:35
  • I see. Can I generalize this? Like not only `32` but also trace `33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47`. I mean I want to trace the same way for `32-47`. Can I do this in a better way. And thanks for your response. I will accept the answer. @SalvatoreDanieleBianco – falamiw May 09 '22 at 14:47
  • In this case the compression will be useless because you have to keep the information about the removed number. If you want I can write an efficient code to handle your task, but the memory usage will be greater than the original `data`: this because you have to keep 3 arrays: the clean array, the indices array (or mask), the array containing the removed numbers. – Salvatore Daniele Bianco May 09 '22 at 15:33
  • 1
    compression won't be issue here. Rather removing those value for some intermediate processing is important here. It will be a great help if you helped me by sharing an efficient code. Thanks again @SalvatoreDanieleBianco – falamiw May 09 '22 at 15:39
  • @falamiw ok. I'll update my answer soon. – Salvatore Daniele Bianco May 09 '22 at 16:49
  • @falamiw just done. Let me know if is all clear. – Salvatore Daniele Bianco May 09 '22 at 16:59
  • 1
    Thanks @SalvatoreDanieleBianco. Your update solution will done my done. – falamiw May 09 '22 at 17:04