Python - 2D Histogram plot in log scale -- Error: `cannot convert float NaN to integer`

Question

This is my previous plot that i want to convert to 2-d histogram.

mass_bh = (subhalos['SubhaloBHMass'] * 1e10 / 0.704) # in units of M_sol h^-1
vdisp = subhalos['SubhaloVelDisp']

nbins = 200
H, xedges, yedges = np.histogram2d(mass_bh,vdisp,bins=nbins)

fig2 = plt.figure()
plt.pcolormesh(xedges,yedges,Hmasked)
cbar = plt.colorbar()
cbar.ax.set_ylabel('g-r')

plt.ylabel(' $\log(\sigma)\quad$ [km s$^{-1}$] ')
plt.xlabel('$\log(M_{BH})\quad$ [M$_{\odot}$]')
plt.title('$M_{BH}-\sigma$ relation')

This instead, gives me this

My previous plot has both its x and y values converted to logarithmic scaling. But for this histogram conversion, it's not working out so great.

How can I work around this?

Thank you!

It's hard to know what you mean by "not properly" without knowing what you want your plot to look like or what kind of data you have. — lanery, Apr 07 '16 at 17:37
@ Lanery - You're right. Let me add a picture real quick and elaborate. — iron2man, Apr 07 '16 at 17:39
The problem seems to be with the data. Notice the limits on the axis are completely different. — armatita, Apr 07 '16 at 17:57
@armatita - Yeah you're right. The previous plot used log scaling for my xy plotting, creating the first plot you see. Implementing a logarithmic scaling for the histogram causes the plot to not appear. The second one is with no logarithmic scaling. — iron2man, Apr 07 '16 at 18:04

lanery · Accepted Answer · 2016-04-08T02:49:48.067

3

@armatita is right about the problem being the data. I think it all comes down to how you do your binning inside histogram2d. See if this example with a random lognormal distribution helps.

import numpy as np
import matplotlib.pyplot as plt

n = 1000

x = np.logspace(2, 10, n)
y = x**1.5
y = y * np.random.lognormal(10, 3, n)

x_bins = np.logspace(np.log10(x.min()), np.log10(x.max()), np.sqrt(n))
y_bins = np.logspace(np.log10(y.min()), np.log10(y.max()), np.sqrt(n))
H, xedges, yedges = np.histogram2d(x, y, bins=[x_bins, y_bins])

fig = plt.figure()
ax1 = fig.add_subplot(211)
ax1.plot(x, y, 'o')
ax1.set_xscale('log')
ax1.set_yscale('log')

ax2 = fig.add_subplot(212)
ax2.pcolormesh(xedges, yedges, H.T)
ax2.set_xscale('log')
ax2.set_yscale('log')

I get the below image, which is what I believe you are looking for. Also note the transpose on H.

edited Apr 08 '16 at 02:49

answered Apr 07 '16 at 18:22

lanery

5,222
3
29
43

1

Cool! Thank you. One thing, I get this error `cannot convert float NaN to integer` from `np.log10(vdisp.max()))])`. Also, some things are divided by zero in log10 – iron2man Apr 07 '16 at 18:34
Those are both issues with your data, so you'll probably want to investigate why you have `nan`s in your dataset, but in the meantime you can use `np.nanmin` to ignore `nan`s. See [this post](http://stackoverflow.com/questions/2821072/is-there-a-better-way-of-making-numpy-argmin-ignore-nan-values?rq=1). – lanery Apr 07 '16 at 18:43
So would i apply np.min in the 2dhistogram bin portion? – iron2man Apr 07 '16 at 19:40
Let me elaborate, how would I apply np.min from the link your provided? I have both arrays the shape of ~120000. – iron2man Apr 07 '16 at 21:05
I updated my answer to hopefully make it more clear what I'm doing with my bins. At any rate, I think what you can do is have `x_bins=np.logspace(np.log10(np.nanmin(mass_bh)), np.log10(np.nanmax(mass_bh)), nbins)` and likewise for `y_bins` (using `vdisp` in lieu of `mass_bh`) – lanery Apr 08 '16 at 16:30
I appreciate that. Than I should just mask the `np.log10(mass_bh)` and `np.log10(vdisp)` in the `x` and `y` portion of `np.2dhistogram`? – iron2man Apr 08 '16 at 19:25
1

Do you mean to fix your problems with `nan`s? I don't see why you wouldn't want to get rid of your `nan`s altogether through some sort of filtering like shown [here](http://stackoverflow.com/questions/11620914/removing-nan-values-from-an-array). I would do the filtering before calling histogram2d. – lanery Apr 08 '16 at 20:01

Alejandro · Answer 2 · 2016-04-07T21:53:46.490

Just a suggestion to pick up your curiosity. Although @lanery clearly answers the question, I would like to share a different method of getting a nice 2d histogram in python. Instead of using np.histogram2d, which in general produces quite ugly histograms, I would like to recycle py-sphviewer, a python package for rendering particle simulations using an adaptive smoothing kernel. Consider the following code, which is based on the example of lanery:

import numpy as np import matplotlib.pyplot as plt import sphviewer as sph

def myplot(x, y, extent=None, nb=8, xsize=500, ysize=500):   
    if(extent == None):
        xmin = np.min(x)
        xmax = np.max(x)
        ymin = np.min(y)
        ymax = np.max(y)
    else:
        xmin = extent[0]
        xmax = extent[1]
        ymin = extent[2]
        ymax = extent[3]

    k, = np.where( (x <= xmax) & (x >= xmin) & 
                   (y <= ymax) & (y >= ymin) )

    pos = np.zeros([3, len(k)])
    pos[0,:] = (x[k]-xmin)/(xmax-xmin)
    pos[1,:] = (y[k]-ymin)/(ymax-ymin)
    w = np.ones(len(k))

    P = sph.Particles(pos, w, nb=nb)
    S = sph.Scene(P)
    S.update_camera(r='infinity', x=0.5, y=0.5, z=0, 
                    extent=[-0.5,0.5,-0.5,0.5],
                    xsize=xsize, ysize=ysize)
    R = sph.Render(S)
    R.set_logscale()
    img = R.get_image()

    return img, [xmin,xmax,ymin,ymax]    


n = 1000

x = np.logspace(2, 10, n)
y = x**1.5
y = y * np.random.lognormal(10, 3, n)

H, xedges, yedges = np.histogram2d(x, y, bins=[np.logspace(np.log10(x.min()), np.log10(x.max())),
                                               np.logspace(np.log10(y.min()), np.log10(y.max()))])


img, extent = myplot(np.log10(x), np.log10(y))   #Call the function to make the 2d-histogram

fig = plt.figure()
ax1 = fig.add_subplot(311)
ax1.plot(x, y, 'o')
ax1.set_xscale('log')
ax1.set_yscale('log')

ax2 = fig.add_subplot(312)
ax2.pcolormesh(xedges, yedges, H.T)
ax2.set_xscale('log')
ax2.set_yscale('log')

ax3 = fig.add_subplot(313)
ax3.imshow(img, origin='lower', extent=extent, aspect='auto')

plt.show()

which produces the following output:

The function myplot() is just a very simple function that I've written in order to normalize the data and give it as input of py-sphviewer. The length of the smoothing kernel is typically given by the parameter nb, which specify the number of neighbours over which the smoothing is performed. Although is seems complicated at first sight, the ideas and the implementation are very easy, and the result is by far superior compared to np.histogram2d. But of course, it depends whether you are able to spread out your data or not, and what the meaning and consequence of doing that for your research.

Python - 2D Histogram plot in log scale -- Error: `cannot convert float NaN to integer`

2 Answers2