How to plot the distribution of a third variable in a 2d histogram?

Question

Imagine you have a data set in three dimensions, x, y and z, and you want to show their relation. You could do this for example using a scatter plot in x and y and adding information about z with the help of a colormap:

But such a plot can be hard to read or even missleading, so I would like to use a 2d-histogram in x and y instead and weigh each data point by their z value:

However, as can be seen by the plot above, the magnitude of bin values can now be much higher than the maximum in z, which makes sense of course, as the bin values are usually the sums several z values.

So weighing by their z value is not enough, I also need to "normalize" each bin value by the number of data points within it. But as can be seen on the right plot above, for some reason, this doesn't seem to work. The color value range remains unchanged.

What am I doing wrong and is there a better approach to do this?

Code for reproduction (loosely based on this example):

import matplotlib.pyplot as plt
import numpy as np


# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)


fig, ax = plt.subplots(figsize=(4, 3), constrained_layout=True)
data = ax.scatter(x, y, c=z, s=10)
fig.colorbar(data, ax=ax, label='z')
ax.set(xlabel='x', ylabel='y', title='scatter')
fig.show()

bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')

_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')

_, _, _, img = axs[2].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
data = img.get_array().reshape((bins, bins))
hist, _, _ = np.histogram2d(x, y, bins=bins)
mask = hist > 0
data[mask] = data[mask]/hist[mask]
img.set_array(data)
img.update_scalarmappable()
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()

See [Compute mean value per pixel using weighted 2d histogram](https://stackoverflow.com/questions/64767788/compute-mean-value-per-pixel-using-weighted-2d-histogram). The idea is to divide the weighted histogram bins by the counts of each bin. — JohanC, Feb 21 '22 at 15:20
Hi, thanks! That's what I did, but for some reason, it doesn't work. — mapf, Feb 21 '22 at 15:22
Did you try to follow the [linked example](https://stackoverflow.com/questions/64767788/compute-mean-value-per-pixel-using-weighted-2d-histogram)? Using `.get_array()` and `.reshape()` might have x and y switched. If you stick to exactly the same `np.histogram2d` twice, you avoid those ambiguities. — JohanC, Feb 21 '22 at 16:15
I just did, thanks! Your approach worked. I still wonder what I did wrong though... — mapf, Feb 21 '22 at 17:10
Well, probably x and y are mixed. An image and a 2D histogram use different conventions for x and y. — JohanC, Feb 21 '22 at 17:17
Makes sense, I'll check it out. Weirdly enough, your solution worked exactly like you showed for the simple MWE above, but for my actual data, I had to transpose `(sums / counts)` *and* switch `xbins` and `ybins`. No idea why. — mapf, Feb 21 '22 at 17:22

score 1 · Accepted Answer · answered Feb 23 '22 at 10:35

Implementing the solution from this very similar post, I managed to make it work, however I'm still not sure why my original approach didn't work.

import matplotlib.pyplot as plt
import numpy as np


# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)

bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')

_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')

sums, xbins, ybins = np.histogram2d(x, y, bins=bins, weights=z)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'):  # suppress possible divide-by-zero warnings
    img = axs[2].pcolormesh(xbins, ybins, sums / counts)
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()

How to plot the distribution of a third variable in a 2d histogram?

1 Answers1

Linked