1

I am looking to speed up the loading and processing of large binary files using code I have written. The code works however it is slow. I want to make improvements, but whenever I load the files a second time I don't get the true loading speed as the cached memory speeds it up. How can I disable this cache, or alternatively how can I measure the true performance of the code so I can work on iterating it?

Here's what I am doing:

  1. Load a large binary file by first calling np.memmap on it
  2. Index into memmapped array to retrieve the values I want (this is the slow part, afaik)

I have tried manually deleting del the variable storing the data and then calling the garbage collector on it with gc.collect() however this doesn't help.

How should I deallocate this information from memory so that I get the true loading times (or how do I get around this problem to measure the true performance)? Do I need Cython to handle this properly, I was hoping not to go there just yet.

Edit, bits of code:

Loading the memmap:

traces  = np.memmap(datapath, dtype='int16', mode='r',
                     offset=offset, shape=(number_samples, number_channels))

# index into this, where waveform_ids are the indices I need
waveforms=traces[waveform_ids,:].astype(np.float32).T

Both of these lines are called from within a function in my code. After these I tried running the garbage collector:

import gc
del waveforms
gc.collect()

However this did not restore the original slow loading time, hence I reckon it did not clear the memory cache

agol
  • 11
  • 2
  • 1
    Please paste a portion of code to show us what you've tried. – Capie Jan 11 '21 at 12:46
  • Maybe [this thread](https://stackoverflow.com/questions/23977904/how-to-implement-garbage-collection-in-numpy) may help you – Azat Ibrakov Jan 11 '21 at 12:49
  • Thank you both. I added some snippets to show what I am running. I also tried adding ```gc.disable()``` at the start of my code in case this is why the GC didn't work. I also ran ```%xdel waveforms ``` in case IPython was caching. And finally I ran the code in a normal python console from the command line to get around IPython. None of these helped – agol Jan 11 '21 at 13:21

0 Answers0