1

As explained by Joe Kington answering in this question : How can I make a scatter plot colored by density in matplotlib, I made a scatter plot colored by density. However, due to the complex distribution of my data, I would like to change the parameters used to calculate the density. Here is the results with some fake data similar to mine : Density plot : Unwanted result I would want to calibrate the density calculations of gaussian_kde so that the left part of the plot looks like this : enter image description here I don't like the first plot because the groups of points influence the density of adjacent groups of points and that prevents me from analyzing the distribution within a group. In other words, even if each of the 8 groups have exactly the same distribution, that won't be visible on the graph.

I tried to modify the covariance_factor (like I once did for a 2d plot of density over x), but when gaussian_kde is used with multiple dimension arrays it returns a numpy.ndarray, not a "scipy.stats.kde.gaussian_kde" object. Plus, I don't even know if changing the covariance_factor will do it.

Here's my dummy code :

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# Generate fake data
a = np.random.normal(size=1000)
b = np.random.normal(size=1000)

# Data for the first image
x = np.concatenate((a+10,a+10,a+20,a+20,a+30,a+30,a+40,a+40,a+80))
y = np.concatenate((b+10,b-10,b+10,b-10,b+10,b-10,b+10,b-10,b*4))

# Data for the second image
#x = np.concatenate((a+10,a+10,a+20,a+20,a+30,a+30,a+40,a+40))
#y = np.concatenate((b+10,b-10,b+10,b-10,b+10,b-10,b+10,b-10))

# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)

# My unsuccesfull try to modify covariance which would work in 1D with "z = gaussian_kde(x)"
#z.covariance_factor = lambda : 0.01
#z._compute_covariance()

# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]

fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50, edgecolor='')
plt.show()

The solution could use an other density calculator, I don't mind. The goal is to make a density plot like the ones showed above, where I can play with the density parameters.

I'm using python 3.4.3

Community
  • 1
  • 1
Gab
  • 71
  • 1
  • 9
  • Could you explain more clearly what's wrong with the first plot? My guess is that you want a less "smooth" density estimate. This can be controlled to some extent by varying the bandwidth of the Gaussian kernel, for example by passing different scalar values for the `bw_method=` parameter. However, bear in mind that KDE generally works best for unimodal distributions - oversmoothing is more or less unavoidable for complicated multimodal distributions such as yours. – ali_m Jul 17 '16 at 22:41
  • 1) I edited the question. 2) I would be interested by an example of how to change the bw_method parameter. 3) Thank you for the distribution advice. – Gab Jul 18 '16 at 13:48

1 Answers1

1

Did have a look at Seaborn? It's not exactly what you're asking for, but it already has functions for generating density plots:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kendalltau
import seaborn as sns

# Generate fake data
a = np.random.normal(size=1000)
b = np.random.normal(size=1000)

# Data for the first image
x = np.concatenate((a+10, a+10, a+20, a+20, a+30, a+30, a+40, a+40, a+80))
y = np.concatenate((b+10, b-10, b+10, b-10, b+10, b-10, b+10, b-10, b*4))

sns.jointplot(x, y, kind="hex", stat_func=kendalltau)
sns.jointplot(x, y, kind="kde", stat_func=kendalltau)
plt.show()

It gives: Hexplot and KDEplot

Dietrich
  • 5,241
  • 3
  • 24
  • 36
  • Thank you, it's not exactly what I'm looking for (because I loose individual dots), but I'll try it and keep it as an alternative solution ! – Gab Jul 18 '16 at 13:58