Color-coded 2D histogram

Question

I want to plot a color-coded histogram, where I input an array of arrays to represent the elements on the y-axis, while a simple 1D array on the x-axis to represent a phase.

The array of arrays to plot on the y-axis has a dimension, let's say, (100, 25), while the phase on the x-axis has 25 elements. Therefore, 100 is the number of elements that have to be color-coded for each of the 25 phase-bins.

I thought numpy.hist2d was suitable for this, but it only takes two same-sized arrays as input. I suppose I have to create a map of colors for each of the 25 arrays of 100 elements?

I really do not know how to approach this, because I have no experience with color-coded plots at all.

EDIT: I found this example as quite close to my case, except that I want a 2D plot where the Z-dimension is the color:

Also, the different histograms need to have the same color-coding. Here an example of my data:

 phase (X-axis) =  [ 0.01952176  0.04740999  0.07529822  0.10318645  0.13107468  
 0.15896291 0.18685114  0.21473937  0.2426276   0.27051583  0.29840406 
 0.32629229 0.35418052  0.38206875  0.40995698  0.43784521  0.46573344  
 0.49362167 0.5215099   0.54939813  0.57728636  0.60517459  0.63306282  
 0.66095105 0.68883928  0.71672751  0.74461574  0.77250397  0.8003922   
 0.82828043 0.85616866  0.88405689  0.91194512  0.93983335  0.96772158  
 0.99560981] 
 data to be color-coded in histograms (Y-axis) = [[ 0.01011273  0.00237802 -0.00227542 ...,         nan         nan          nan]
 [-0.00407017 -0.00317593 -0.00605734 ...,         nan         nan
      nan]
 [ 0.0166795   0.00798681  0.00075688 ...,         0.01022334         nan
      nan]
 ..., 
 [ 0.00940512         nan         nan ...,         nan         0.00022334
      0.00134779]
 [ 0.00176177  0.00151938         nan ...,         0.05692114         0.00021122
      -0.00003121]
 [        nan  0.00455727         nan ...,         0.06812121         0.00011512
      0.00016711]]

So do you want the differently colored histograms offset sideways from one another? As in gold/blue/green/red/gold/blue/green/red ... ? — roadrunner66, Jun 17 '16 at 14:38
@roadrunner66, not really. The colored histograms should be offset of course, but the color-code has to be applied on the height of the histograms in each bin. Hope it is clearer now. — Py-ser, Jun 17 '16 at 15:34
Ok, so you want to make each histogram bin contain all 4 colors, stacked on top of one another? — roadrunner66, Jun 17 '16 at 18:22
@roadrunner66, nope. Sorry for the confusion. each histogram has to be colored [like this](http://i.stack.imgur.com/NCCpe.png), except that I would like the red for the highest part. — Py-ser, Jun 18 '16 at 09:45
I think you are looking for a heatmap: http://stackoverflow.com/questions/2369492/generate-a-heatmap-in-matplotlib-using-a-scatter-data-set and http://stackoverflow.com/questions/14391959/heatmap-in-matplotlib-with-pcolor — StefanS, Jun 23 '16 at 07:27

score 3 · Answer 1 · answered Jun 20 '16 at 09:27

maybe if you could provide some example of your data, we could do better. Unless I did not fully understand your question, I think this does what you want:

data = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,11],[13,14,15]])

#colormap
cm = plt.cm.get_cmap('RdYlBu_r')
norm = matplotlib.colors.Normalize(vmin=data.min(), vmax=data.max())
sm = plt.cm.ScalarMappable(cmap=cm, norm=norm)

fig = plt.figure()
ax = fig.add_subplot(111,projection='3d')
for i, l in enumerate(data):
    ax.bar(np.arange(0,len(l)), l, zs=i, zdir='y', alpha=0.8, color=sm.to_rgba(l))

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

thank you. I will edit my original question with some example of data. Some comments on your answer: 1) I need a 2D plot, not 3D. 2) The color-code has to be the same for each one of the histograms. Aslo, if you can put some comments - directly in the code is fine - it will be of great help. — Py-ser, Jun 20 '16 at 09:34

Diziet Asahi · Accepted Answer · 2016-06-20T15:39:45.753

3

If I understood correctly in the end, you have an array (25,100), and you want to calculate the distribution of the 100 data points for each row. There's probably a way to use hist2dfor this, but I don't know how to use it, so here would be my method:

Nphase = 25
Npoints = 100

phase = np.linspace(0.,1.,num=Nphase)
data = np.array([A*np.random.normal(size=(Npoints,))+C for (A,C) in 
                 zip(
            np.random.randint(1,2,Nphase),
            np.random.randint(-5,5,Nphase))])
#sprinkle some NaN
for i,j in zip(np.random.randint(0,Nphase,size=(10,)),np.random.randint(0,Npoints,size=(10,))):
    data[i,j] = np.NaN

You don't say anything about the range of your data, or how they scale relative to one another. Here I'm going to do an histogram with 20 bins, and with the same limits.

#calculate the bins we're going to use
minBin, maxBin = np.nanmin(data),np.nanmax(data)
Nbins = 20

calculate the histogram by iterating over each row

binedData = np.zeros((Nphase,Nbins))
for i,a in enumerate(data):
    binedData[i,:], bins = np.histogram(a[~np.isnan(a)],bins=Nbins,range=(minBin,maxBin))

an plot

plt.matshow(binedData.T, cmap=plt.cm.RdYlBu_r, extent=(0,Nphase,maxBin,minBin))
plt.grid(False)
c = plt.colorbar(orientation='horizontal')
plt.xlabel('Phase')
plt.ylabel('bins')
c.set_label('Frequency')

Now, you mentioned that you want to normalize each row. There are several ways to do this, the best way would be to create a normalized histogram where the area under the curve is equal to 1 (see density argument to the histogram function). Here I assumed you just want the max to all be equal for visualisation sake.

# normalize histogram
data2 = 1.*(binedData - np.nanmin(binedData,axis=1, keepdims=True)) / (np.nanmax(binedData,axis=1,keepdims=True)-np.nanmin(binedData,axis=1,keepdims=True))

plt.matshow(data2.T, cmap=plt.cm.RdYlBu_r, extent=(0,Nphase,maxBin,minBin))
plt.grid(False)
c = plt.colorbar(orientation='horizontal')
plt.xlabel('Phase')
plt.ylabel('bins')
c.set_label('Frequency')

edited Jun 20 '16 at 15:39

answered Jun 20 '16 at 12:01

Diziet Asahi

38,379
7
60
75

Thanks! Yes, we are close, but not totally. Each row contains the values of the Y-axis, not the X-axis (basically this means that Y-axis goes up to 186 in your example, while the X-axis from 0 to 25, i.e. the phase bins). The X-axis reports the phase. Also, does this code work for arrays with nan data? And a jet color map would be nicer since I need to bin with more bins. – Py-ser Jun 20 '16 at 12:15
I've modified my example. To plot the phase in X, I plotted the transpose of my data array `data.T` instead of `data`. I've modified the normalization formulat to take into account `NaNs`. I've changed to the `jet` colormap, but be aware that it is generally considered a terrible choice of colormap – Diziet Asahi Jun 20 '16 at 13:40
Thanks a lot! I was thinking to use the transpose as well. There is still something (important) missing. As I mentioned in my previous comment, the Y-axis should go up to `186`, which is the maximum value in the arrays you have listed as example. Not the color bar, but the Y-axis. The colors should code the frequency of those values, the occurrence, the height of the distribution (if that value happens 1000 times it should be in red, if it happens only 3 times - in blue). Since you discourage the jet map, I will modify the code to chose the appropriate cmap. – Py-ser Jun 20 '16 at 13:52
I was wondering why you were talking about histograms... What I don't understand though, is the shape of you data. I thought you had a (100,25) where each value was already binned in one of the 25 phases? I'll need some more info about your data and what you are trying to achieve to help any further – Diziet Asahi Jun 20 '16 at 14:23
Sorry for the confusion, I know it is not easy. I already posted my data, what kind of other info you need? Each of the 25 rows of the array has 100 elements (included nans). – Py-ser Jun 20 '16 at 14:43
Well you posted your data, but it is incomplete, see the (...) in the rows. But nevermind. You have 25 phases, for each phases you have 100 data points, and you want to plot the distribution of these data points for each of the phase? Did I get it finally? – Diziet Asahi Jun 20 '16 at 14:52
Exactly. The colors should code the occurrence, the frequency, the height of the distribution. – Py-ser Jun 20 '16 at 14:54
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/115131/discussion-between-diziet-asahi-and-py-ser). – Diziet Asahi Jun 20 '16 at 15:40
@Py-ser did you get to test the solutions? – Diziet Asahi Jun 26 '16 at 10:46
are you sure it plots the right histograms? It seems to me that the data distributions and the plotted histograms are different. – Py-ser Jun 29 '16 at 16:24

Color-coded 2D histogram

2 Answers2

Linked