3

Running the following code will result in memory usage rapidly creeping up.

import numpy as np
import pylab as p
mu, sigma = 100, 15
x = mu + sigma*np.random.randn(100000)
for i in range(100):
    n, bins, patches = p.hist(x, 5000)

However, when substituting the call to pylab with a direct call to the numpy histogram method then memory usage is constant (it also runs significantly faster).

import numpy as np
mu, sigma = 100, 15
x = mu + sigma*np.random.randn(100000)
for i in range(100):
    n, bins = np.histogram(x, 5000)

I was under the impression that pylab is using the numpy histogram function. There must be a bug somewhere...

schubnel
  • 31
  • 1
  • 1
    You have to clear your figure: http://stackoverflow.com/questions/8213522/matplotlib-clearing-a-plot-when-to-use-cla-clf-or-close if you really want to draw histograms in a loop. Plus do garbage collect if you use 2.x Python. But I guess drawing histogram wasn't your intention – theta May 31 '13 at 05:11
  • Thanks! I did try gc.colletc() without success but adding a clf() inside the loop does the trick! (And you are right I didn't want to draw the histograms - I just wanted to use the functionality) – schubnel May 31 '13 at 05:41

1 Answers1

2

Matplotlib generates a diagram. NumPy does not. Add p.show() to your first code to see where the work goes.

import numpy as np
import pylab as p
mu, sigma = 100, 15
x = mu + sigma*np.random.randn(100000)
n, bins, patches = p.hist(x, 5000)
p.show()

You may want to try with a smaller number for np.random.randn(100000) first to see something quickly.

EDIT

Does not really make sense to create the same plot 100 times.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161