Matplotlib.pyplot.hist() very slow

Question

I'm plotting about 10,000 items in an array. They are of around 1,000 unique values.

The plotting has been running half an hour now. I made sure rest of the code works.

Is it that slow? This is my first time plotting histograms with pyplot.

Yes, I would say that is very slow. In reality it depends on how many bins you selected, but i.e. for a 1000 bins I can plot 10 000 random generated values in about a second or two. Python 2, laptop core Intel i5 os Ubuntu 14.04. Show some code, it'll make things easier. — ljetibo, Mar 02 '16 at 04:45
Actually I solved it by just reducing number of bins. Thanks though. — Fenwick, Mar 02 '16 at 04:50
Are you sure you're using the correct column data type? I was using strings instead of integers and that was a sheer error on my part. — piedpiper, Aug 01 '19 at 08:57

score 27 · Answer 1 · answered Sep 19 '16 at 21:20

27

To plot histograms using matplotlib quickly you need to pass the histtype='step' argument to pyplot.hist. For example:

plt.hist(np.random.exponential(size=1000000,bins=10000))
plt.show()

takes ~15 seconds to draw and roughly 5-10 seconds to update when you pan or zoom.

In contrast, plotting with histtype='step':

plt.hist(np.random.exponential(size=1000000),bins=10000,histtype='step')
plt.show()

plots almost immediately and can be panned and zoomed with no delay.

answered Sep 19 '16 at 21:20

user545424

15,713
11
56
70

1

This is much faster as you say (I'm seeing the same times as you). But the graphs look very different with histtype='step'. – demented hedgehog Sep 23 '18 at 00:49
1

@dementedhedgehog yes, they do. I guess it depends on which discipline you are in. In high energy physics the step style is the norm. I opened an issue on the matplotlib page to discuss the issue here a while ago: https://github.com/matplotlib/matplotlib/issues/7121. – user545424 Sep 24 '18 at 13:36

score 15 · Answer 2 · answered Aug 28 '19 at 07:39

15

It will be instant to plot the histogram after flattening the numpy array. Try the below demo code:

import numpy as np

array2d = np.random.random_sample((512,512))*100
plt.hist(array2d.flatten())
plt.hist(array2d.flatten(), bins=1000)

answered Aug 28 '19 at 07:39

CcMango

377
1
4
15

2

Was having this same issue, this solution worked like a charm. – Jarom Jan 24 '20 at 00:52
This should be the accepted answer. Handled 100k values instantly as opposed to it not returning otherwise. If plotting multiple histograms, `array2d.flatten()` does cause the histograms to be plotted as one. Resolution is to add each column separately. – Eilon Baer Nov 12 '20 at 10:58
This should the accepted answer. Far, far superior to the ones more upvoted – jbcd13 Jan 11 '23 at 19:52
This is fantastic! But I wonder why flattening the array would have such a huge improvement in executing time like that? – Khoa LT Apr 14 '23 at 07:02

score 7 · Answer 3 · answered Mar 16 '18 at 14:07

7

Importing seaborn somewhere in the code may cause pyplot.hist to take a really long time.

If the problem is seaborn, it can be solved by resetting the matplotlib settings:

import seaborn as sns
sns.reset_orig()

answered Mar 16 '18 at 14:07

Niko Föhr

28,336
10
93
96

score 3 · Answer 4 · edited Nov 28 '21 at 01:35

3

For me, the problem is that the data type of pd.Series, say S, is 'object' rather than 'float64'. After I use S = np.float64(S), then plt.hist(S) is very quick.

edited Nov 28 '21 at 01:35

Trenton McKinney

56,955
33
144
158

answered Jul 04 '19 at 00:38

Napoléon

301
3
7

The correct way to change the type of a `pandas.Series` is with `.astype()`: `S.astype('float64')` – Trenton McKinney Nov 28 '21 at 01:37

score 2 · Answer 5 · answered Mar 03 '21 at 17:25

2

Since several answers already mention the issue of slowness with pandas.hist(), note that it may be due to dealing with non-numerical data. An issue easily solved by using value_counts() :

df['colour'].value_counts().plot(kind='bar')

credits

answered Mar 03 '21 at 17:25

Skippy le Grand Gourou

6,976
4
60
76

score 1 · Answer 6 · answered Jul 01 '20 at 16:16

1

I was facing the same problem using Pandas .hist() method. For me the solution was:

pd.to_numeric(df['your_data']).hist()

Which worked instantly.

answered Jul 01 '20 at 16:16

Nic Scozzaro

6,651
3
42
46

score 0 · Answer 7 · answered Nov 05 '19 at 08:38

For me it took calling figure.canvas.draw() after the call to hist to update immediately, i.e. hist was actually fast (discovered that after timing it), but there was a delay of a few seconds before figure was updated. I was calling hist inside a matplotlib callback in a jupyter lab cell (qt5 backend).

score 0 · Answer 8 · answered Feb 26 '20 at 09:26

0

Anyone running into the issue I had - (which is totally my bad :) )

If you're dealing with numbers, make sure when reading from CSV that your datatype is int/float, and not string.

values_arr = .... .flatten().astype('float')

answered Feb 26 '20 at 09:26

Oded Ben Dov

9,936
6
38
53

score 0 · Answer 9 · answered Mar 19 '20 at 05:11

0

If you are working with pandas, make sure the data you passed in plt.hist() is a 1-d series rather than a dataframe. This helped me out.

answered Mar 19 '20 at 05:11

Shengge Yang

1

Matplotlib.pyplot.hist() very slow

9 Answers9

Linked