5

I need to generate an image similar to the one shown in this example:

enter image description here

The difference is that, instead of having the scattered points in two dimensions, I have a two-dimensional histogram generated with numpy's histogram2d and plotted using with imshow and gridspec:

enter image description here

How can I project this 2D histogram into a horizontal and a vertical histogram (or curves) so that it looks aligned, like the first image?


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

data = # Uploaded to http://pastebin.com/tjLqM9gQ

# Create a meshgrid of coordinates (0,1,...,N) times (0,1,...,N)
y, x = np.mgrid[:len(data[0, :, 0]), :len(data[0, 0, :])]
# duplicating the grids
xcoord, ycoord = np.array([x] * len(data)), np.array([y] * len(data))
# compute histogram with coordinates as x,y
h, xe, ye = np.histogram2d(
    xcoord.ravel(), ycoord.ravel(),
    bins=[len(data[0, 0, :]), len(data[0, :, 0])],
    weights=stars.ravel())

# Projected histograms inx and y
hx, hy = h.sum(axis=0), h.sum(axis=1)

# Define size of figure
fig = plt.figure(figsize=(20, 15))
gs = gridspec.GridSpec(10, 12)

# Define the positions of the subplots.
ax0 = plt.subplot(gs[6:10, 5:9])
axx = plt.subplot(gs[5:6, 5:9])
axy = plt.subplot(gs[6:10, 9:10])

ax0.imshow(h, cmap=plt.cm.viridis, interpolation='nearest',
           origin='lower', vmin=0.)

# Remove tick labels
nullfmt = NullFormatter()
axx.xaxis.set_major_formatter(nullfmt)
axx.yaxis.set_major_formatter(nullfmt)
axy.xaxis.set_major_formatter(nullfmt)
axy.yaxis.set_major_formatter(nullfmt)

# Top plot
axx.plot(hx)
axx.set_xlim(ax0.get_xlim())
# Right plot
axy.plot(hy, range(len(hy)))
axy.set_ylim(ax0.get_ylim())

fig.tight_layout()
plt.savefig('del.png')
Gabriel
  • 40,504
  • 73
  • 230
  • 404
  • Couldn't there be multiple pairs of horizontal and vertical histograms that could result in a given 2d histogram matrix? – Nick Becker Nov 16 '16 at 20:45
  • Not sure I follow Nick. I need to *project* the 2D histogram to the x,y axis. This is, to stack the values in all columns (ie: sum all bin values for a given column or x value, and repeat for all columns), to produce the horizontal histogram (x axis), and the values in all lines to produce the vertical histogram (y axis). – Gabriel Nov 16 '16 at 21:06
  • It seems you already have the solution ( *"sum all bin values for a given column or x value, and repeat for all columns"* ), so where exactly is the problem? – ImportanceOfBeingErnest Nov 16 '16 at 21:16
  • Are you asking how to project the data, i.e. `hx, hy = h.sum(axis=0), h.sum(axis=1)` or how to plot it? – user545424 Nov 16 '16 at 21:39
  • Yes, I'm asking how to perform that sum so that the result is a 1D histogram for each dimension. `hx, hy = h.sum(axis=0), h.sum(axis=1)` does not give the expected results. – Gabriel Nov 16 '16 at 23:34
  • Sorry, `hx, hy = h.sum(axis=0), h.sum(axis=1)` does work, only it does not produce a histogram, rather a series of points which is still good. Could you post your comment as an answer 545424? – Gabriel Nov 16 '16 at 23:38
  • You really need to distinguish between data and visualization. Since `h` already **is** a histogram, `hx,hy` will be marginal histograms as well. When it comes to visualization you can plot any histogram either as dots or bars or whatever you like using the method of your choice from the matplotlib gallery. – ImportanceOfBeingErnest Nov 17 '16 at 09:11
  • The data obtention part is more or less solved with `hx, hy = h.sum(axis=0), h.sum(axis=1)`, but the visualization is not. Using `imshow` (instead of `scatter` as in the example) messes up the alignment of the plots. – Gabriel Nov 17 '16 at 13:43
  • @ImportanceOfBeingErnest I've updated the question using 545424's way of obtaining the projected histograms. As you can see, the issue that remains is the visualization. – Gabriel Nov 17 '16 at 13:53
  • Indeed, so now that you know which data to plot, you can use any of the examples you find to plot the marginals, like [the original one](http://matplotlib.org/examples/pylab_examples/scatter_hist.html) or [this nice one](http://stackoverflow.com/questions/20525983/matplotlib-imshow-a-2d-array-with-plots-of-its-marginal-densities/20527817). – ImportanceOfBeingErnest Nov 17 '16 at 14:22
  • The original plot doesn't use `gridspec`, so it's no good. The example you provide uses `gridspec` but it affects the entire image with the width and height ratios, so I can't use that or it would affect the rest of the plots in the figure. This last example did give me the idea to force the aspect in `imshow` to `auto`, and that seems to work. If you want, you can use that to post an answer and I'll accept it. Thank you! – Gabriel Nov 17 '16 at 14:44

1 Answers1

1

If you are ok with the marginal distributions all being upright, you could use corner

E.g.:

import corner
import numpy as np
import pandas as pd

N = 1000

CORNER_KWARGS = dict(
    smooth=0.9,
    label_kwargs=dict(fontsize=30),
    title_kwargs=dict(fontsize=16),
    truth_color="tab:orange",
    quantiles=[0.16, 0.84],
    levels=(1 - np.exp(-0.5), 1 - np.exp(-2), 1 - np.exp(-9 / 2.0)),
    plot_density=False,
    plot_datapoints=False,
    fill_contours=True,
    max_n_ticks=3,
    verbose=False,
    use_math_text=True,
)


def generate_data():
    return pd.DataFrame(dict(
        x=np.random.normal(0, 1, N),
        y=np.random.normal(0, 1, N)
    ))


def main():
    data = generate_data()
    fig = corner.corner(data, **CORNER_KWARGS)
    fig.show()


if __name__ == "__main__":
    main()

enter image description here

Avi Vajpeyi
  • 568
  • 6
  • 17