Python scatter plot of 4D data

Question

I have 4D array of data that I would like to scatter plot. The data can be seen as x- and y- coordinates for each pair of values of two additional parameters.

I would like to "flatten" the plot to a 2D scatter plot where the two extra parameters are represented by different colors instead, e.g. a color for each pair of the two parameters. Alternatively, I would like points that are only plotted for few of the parameter pairs to look light while points plotted for many of the parameter pairs look heavier/darker. Maybe this could be achieved by "stacking" somewhat translucent dots on top of each other?

Is there some standard approach for doing this in Python, for example using matplotlib?

Maybe a scatterplot matrix is better solution. Look [here](http://pandas.pydata.org/pandas-docs/stable/visualization.html#scatter-matrix-plot) for an example. — Andrej, Jul 08 '14 at 09:18
That does look interesting. Unfortunately, I have no experience with `pandas` but maybe I should check it out. — Thomas Arildsen, Jul 08 '14 at 10:04
There are related pure `matplotlib` examples in [this question](http://stackoverflow.com/questions/7941207/is-there-a-function-to-make-scatterplot-matrices-in-matplotlib). [@tisimst's answer](http://stackoverflow.com/a/16489216/3751373), which is a refactoring of [@Joe Kington's](http://stackoverflow.com/a/7941594/3751373) appears to be the most complete. — Laurence Billingham, Jul 08 '14 at 11:07
@lbn-plus-1 n-plus-1 that question is close, but not quite a duplicate. The short answer is yes. Can you show us what you have tried with scatter? That would be a better starting point than 'please write code for me'. — tacaswell, Jul 09 '14 at 05:34
I experimented with a couple of solutions yesterday. I will add them here later today if I get time for it. — Thomas Arildsen, Jul 09 '14 at 10:57
@lbn-plus-1 I have added code of my attempts as answers now and I have found two of these useful and am using those for now. — Thomas Arildsen, Jul 09 '14 at 12:22
What are the additional parameters? You could transform them into a velocity - like vector and use a quiver type of scatter plot. In that case I could provide with code. — meduz, Jul 10 '14 at 21:06

score 0 · Answer 1 · edited May 23 '17 at 10:30

I tried my suggested approach of "stacking" translucent scatter plots on top of each other:

import numpy as np
import matplotlib.pyplot as plt

for ii in xrange(len(param1)):
    for jj in xrange(len(param2)):
        delta_idx, rho_idx = np.where(data1[:,:,ii,jj] < data2[:,:,ii,jj])
        plt.scatter(delta_idx, rho_idx, marker = 'o', c = 'k', alpha = 0.01)
plt.xlabel('$\delta$')
plt.ylabel('$\rho$')
plt.show()

The two-dimensional points I described in my question are actually an identification of where the values in data1 are smaller than the corresponding values in data2. This produced the following plot: Stacked scatter plot

A lot more could be done to nice-ify the plot, but I was not really satisfied with the way it looks so I tried another approach. I post this here anyway in case someone finds it useful.

score 0 · Answer 2 · edited May 23 '17 at 11:55

As an alternative to the "stacked" scatter plot I tried accumulating the occurences of data1 < data2 in a 2D "occurrence map" first. I then plotted this map using a pcolormesh (imported from prettyplotlib to make it look better):

import prettyplotlib as ppl
import numpy as np

occurrence_map = np.sum(data1 < data2, axis=(2,3), dtype=float) / np.prod(data1.shape[2:])
ppl.pcolormesh(occurrence_map2, vmin=0, vmax=1)

The normalisation is in order to produce a relative measure of occurence, i.e., in how large a fraction of the parameter pairs (two last dimensions of data1 and data2) is data1 < data2? The plot is then configured to colour values in the range from 0 to 1. This produces the following plot which I am far more pleased with:

pcolormesh plot of relative occurences

score 0 · Answer 3 · edited May 23 '17 at 12:34

The comments about scatterplot matrices inspired me to try something like that as well. Scatterplot matrices were not exactly what I was looking for, but I took the code from @tisimst's answer suggested by @lbn-plus-1 and adapted it a bit, as follows:

import itertools
import numpy as np
import matplotlib.pyplot as plt

def scatterplot_matrix(data, names=[], **kwargs):
    """Plots a pcolormesh matrix of subplots.  The two first dimensions of
    data are plotted as a mesh of values, one for each of the two last
    dimensions of data. Data must thus be four-dimensional and results
    in a matrix of pcolormesh plots with the number of rows equal to
    the size of the third dimension of data and number of columns
    equal to the size of the fourth dimension of data. Additional
    keyword arguments are passed on to matplotlib\'s \"pcolormesh\"
    command. Returns the matplotlib figure object containg the subplot
    grid.
    """
    assert data.ndim == 4, 'data must be 4-dimensional.'
    datashape = data.shape
    fig, axes = plt.subplots(nrows=datashape[2], ncols=datashape[3], figsize=(8,8))
    fig.subplots_adjust(hspace=0.0, wspace=0.0)

    for ax in axes.flat:
        # Hide all ticks and labels
        ax.xaxis.set_visible(False)
        ax.yaxis.set_visible(False)

        # Set up ticks only on one side for the "edge" subplots...
        if ax.is_first_col():
            ax.yaxis.set_ticks_position('left')
        if ax.is_last_col():
            ax.yaxis.set_ticks_position('right')
        if ax.is_first_row():
            ax.xaxis.set_ticks_position('top')
        if ax.is_last_row():
            ax.xaxis.set_ticks_position('bottom')

    # Plot the data.
    for ii in xrange(datashape[2]):
        for jj in xrange(datashape[3]):
            axes[ii,jj].pcolormesh(data[:,:,ii,jj], **kwargs)

    # Label the diagonal subplots...
    #if not names:
    #    names = ['x'+str(i) for i in range(numvars)]
    # 
    #for i, label in enumerate(names):
    #    axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
    #            ha='center', va='center')

    # Turn on the proper x or y axes ticks.
    #for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
    #    axes[j,i].xaxis.set_visible(True)
    #    axes[i,j].yaxis.set_visible(True)

    # FIX #2: if numvars is odd, the bottom right corner plot doesn't have the
    # correct axes limits, so we pull them from other axes
    #if numvars%2:
    #    xlimits = axes[0,-1].get_xlim()
    #    ylimits = axes[-1,0].get_ylim()
    #    axes[-1,-1].set_xlim(xlimits)
    #    axes[-1,-1].set_ylim(ylimits)

    return fig

if __name__=='__main__':
    np.random.seed(1977)
    data = np.random.random([10] * 4)
    fig = scatterplot_matrix(data,
            linestyle='none', marker='o', color='black', mfc='none')
    fig.suptitle('Simple Scatterplot Matrix')
    plt.show()

I saved the above module as datamatrix.py and use it as follows:

import datamatrix
import brewer2mpl

colors = brewer2mpl.get_map('RdBu', 'Diverging', 11).mpl_colormap
indicator = np.ma.masked_invalid(-np.sign(data1 - data2)) # Negated because the 'RdBu' colormap is the wrong way around
fig = datamatrix.scatterplot_matrix(indicator, cmap = colors)
plt.show()

The brewer2mpl and color map stuff can be left out - that was just some coloring I was toying around with. It results in the following plot:

matrix of pcolormesh plots of occurrences for individual parameter values

The "outer" dimensions of the matrix are the two parameters (the last two dimensions of data1 and data2). Each of the pmeshcolor plots inside the matrix is then an "occurrence map" somewhat similar to that in this answer, but a binary one for each of the pairs of parameters. The white lines at the bottom of some of the plots are regions of equality. The white dot in each of the upper right corners is a nan value in the data.

Python scatter plot of 4D data

3 Answers3