Just a suggestion to pick up your curiosity. Although @lanery clearly answers the question, I would like to share a different method of getting a nice 2d histogram in python. Instead of using np.histogram2d, which in general produces quite ugly histograms, I would like to recycle py-sphviewer, a python package for rendering particle simulations using an adaptive smoothing kernel. Consider the following code, which is based on the example of lanery:
import numpy as np
import matplotlib.pyplot as plt
import sphviewer as sph
def myplot(x, y, extent=None, nb=8, xsize=500, ysize=500):
if(extent == None):
xmin = np.min(x)
xmax = np.max(x)
ymin = np.min(y)
ymax = np.max(y)
else:
xmin = extent[0]
xmax = extent[1]
ymin = extent[2]
ymax = extent[3]
k, = np.where( (x <= xmax) & (x >= xmin) &
(y <= ymax) & (y >= ymin) )
pos = np.zeros([3, len(k)])
pos[0,:] = (x[k]-xmin)/(xmax-xmin)
pos[1,:] = (y[k]-ymin)/(ymax-ymin)
w = np.ones(len(k))
P = sph.Particles(pos, w, nb=nb)
S = sph.Scene(P)
S.update_camera(r='infinity', x=0.5, y=0.5, z=0,
extent=[-0.5,0.5,-0.5,0.5],
xsize=xsize, ysize=ysize)
R = sph.Render(S)
R.set_logscale()
img = R.get_image()
return img, [xmin,xmax,ymin,ymax]
n = 1000
x = np.logspace(2, 10, n)
y = x**1.5
y = y * np.random.lognormal(10, 3, n)
H, xedges, yedges = np.histogram2d(x, y, bins=[np.logspace(np.log10(x.min()), np.log10(x.max())),
np.logspace(np.log10(y.min()), np.log10(y.max()))])
img, extent = myplot(np.log10(x), np.log10(y)) #Call the function to make the 2d-histogram
fig = plt.figure()
ax1 = fig.add_subplot(311)
ax1.plot(x, y, 'o')
ax1.set_xscale('log')
ax1.set_yscale('log')
ax2 = fig.add_subplot(312)
ax2.pcolormesh(xedges, yedges, H.T)
ax2.set_xscale('log')
ax2.set_yscale('log')
ax3 = fig.add_subplot(313)
ax3.imshow(img, origin='lower', extent=extent, aspect='auto')
plt.show()
which produces the following output:

The function myplot() is just a very simple function that I've written in order to normalize the data and give it as input of py-sphviewer. The length of the smoothing kernel is typically given by the parameter nb, which specify the number of neighbours over which the smoothing is performed. Although is seems complicated at first sight, the ideas and the implementation are very easy, and the result is by far superior compared to np.histogram2d. But of course, it depends whether you are able to spread out your data or not, and what the meaning and consequence of doing that for your research.