I would like to plot some data (x,y line plot with matplotlib) that seems to have an asymmetric uncertainty distribution (Maybe a Log-normal distribution, but I do not know). I would like to plot the data in some aggregated way, but using the mean and standard deviation seems to yield some overestimation of the error below the mean.
I think a way to accomplish the visualization is to plot some uncertainty as a shaded area similar to ax.fill_between()
but with the color intensity changing dependent on the probability. This would somehow extend the idea of violin plots to a line plot. Here is some picture with a modified sine functions to visualize:
Description: modified sine functions (see code below). Red line is the mean and shaded area is the standard deviation (left) or the color coded shade (right) representing a probability distribution. Right part was modified with inkscape
Here is my approach with a gaussian kernel density estimation (since I do not know the distribution) similar to the example from (scipy.stats.gaussian_kde)[https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html). I am stuck at the point of the kernel calculation with some LinAlgError: singular matrix
error:
# %% generate random line plots
x = np.linspace(0, 6*np.pi, 50)
def sin_normal(n, x):
"Measurement model, return two coupled measurements."
m1 = np.abs(np.random.lognormal(size=n, sigma=1.5))
return np.array([np.sin(x)]*n).T + m1 + 1 # +1 -> always positive
y = sin_normal(10, x)
y_mean = y.mean(axis=1)
dy = y.std(axis=1)
xmin = x.min()
xmax = x.max()
ymin = y.min()
ymax = y.max()
# %% Perform a kernel density estimate on the data:
values = y
kernel = stats.gaussian_kde(values) # does not work, due to shape ? -> apply for each row?
#Z = ...
# %% plot (depending on the random output the shaded area changes)
fig, ax = plt.subplots(2,1)
ax[0].plot(x, y, '-', markersize=2) # remove in the final plot
ax[0].plot(x, y.mean(axis=1), 'r-')
ax[0].fill_between(x, y_mean - dy, y_mean + dy, alpha=0.5)
ax[1].loglog(x, y, '-', markersize=2) # remove in the final plot
ax[1].loglog(x, y.mean(axis=1), 'r-')
# Add plot of the shaded area here
ax[1].fill_between(x, y_mean - dy, y_mean + dy, alpha=0.5)
plt.show()
#plt.savefig("uncertainty_visualisation_lin.svg")
Relevant links:
- Explanation Density Estimation: see scikit-learn density
- Similar approach but with known distribution
- matplotlib cheatsheets