2

Let's say the function is simply

def linear(x,n):
    return n+x

And that the universe of parameters to try are

import numpy as np
parameters = np.random.normal(0, 1, 1000)

Which is the best way to plot all the universe of functions?

The direct one is

import matplotlib.pyplot as plt
for n in parameters: plt.plot(x,linear(x,n))

However, when the parameres are many (like 1e5), this is not practicable. I think the proper way to do this may be along the lines of a 2D density plot or a 2D histogram but I am not sure how to apply those to my problem.

A further complication: Each of the models tried above carries its own "goodness-fit-estimator value", a $\chi^2$, for example. After obtaining a plot showing the density of models as a color code, I would like to color code it by $\chi^2$ value. Again, I am asking for ideas or experience.

Any suggestion is very welcome!

Stefano
  • 359
  • 1
  • 5
  • 16
  • This [question](https://stackoverflow.com/questions/42262563/please-explain-in-detail-2d-histogram-in-python) is helpful as well. – Stefano Oct 02 '18 at 11:36

1 Answers1

1

For your example you can bin the data with respect to the parameters n and rebin the function output accordingly. I provide a quick example below. If the function dependence on n is more complex then the (re-)binning schema might become more complex as well. But this depends on the specific case. In case the distribution of parameters values n is non-uniform (as in your example) one might consider fixed width bins rather than a running average. But again, this is a matter of taste.

from timeit import timeit
import matplotlib
from matplotlib.colors import Normalize
import matplotlib.pyplot as plt
import numpy as np

def linear(x,n):
    return n+x

N = 1000
x = np.linspace(0, 1, N)
n = np.random.normal(0, 1, N)

cmap = matplotlib.cm.get_cmap('plasma')
norm = Normalize(vmin=n.min(), vmax=n.max())

def p3():
    n2, x2 = np.meshgrid(n, x)
    f = linear(x2, n2)
    # Now comes the "histogram" step. It involves taking the average over neighboring samples.
    # Since data can be distributed non-linearly this might not be the desired behavior but
    # similarly it is possible to specify an array of bins and use this for the binning.
    sort = np.argsort(n)
    n2 = n2[:, sort]
    f = f[:, sort]
    rebin = 10
    n2 = n2.reshape(n2.shape[0], -1, rebin).mean(axis=-1)  # Average over 10 samples.
    f = f.reshape(f.shape[0], -1, rebin).mean(axis=-1)
    x2 = x2[:, ::rebin]  # No average required here, values are the same along second axis.
    plt.figure()
    plt.title('Using histogram method and pcolor')
    plt.pcolor(x2, f, n2, cmap=cmap, norm=Normalize(vmin=n2.min(), vmax=n2.max()))

p3()
plt.show()

Histogram pcolor

I also checked individual line plots and scatter plot. Scatter plot reduces the number of calls to the API however though the actual call has a similar timing, I found the canvas update to take significantly longer (in the background thread; similarly for saving to png for example).

def p1():
    plt.figure()
    plt.title('Plotting individual functions')
    for nn in n:
        plt.plot(x, linear(x, nn), '-', color=cmap(norm(nn)))

def p2():
    n2, x2 = np.meshgrid(n, x)
    f = linear(x2, n2)
    plt.figure()
    plt.title('Using scatter')
    plt.scatter(x2.ravel(), f.ravel(), color=cmap(norm(n2.ravel())))

print('p1: ', timeit('p1()', setup='from __main__ import p1', number=1))
print('p2: ', timeit('p2()', setup='from __main__ import p2', number=1))
plt.show()

Lines

Scatter

a_guest
  • 34,165
  • 12
  • 64
  • 118