Creating Probability/Frequency Axis Grid (Irregularly Spaced) with Matplotlib

Question

I'm trying to create a frequency curve plot, and I'm having trouble manipulating the axis to get the plot I want.

Here is an example of the desired grid/plot I am trying to create:

Frequency Curve Example

Here is what I have managed to create with matplotlib:

Current Matplotlib Result

To create the grid in this plot, I used the following code:

m1 = pd.np.arange(.2, 1, .1)
m2 = pd.np.arange(1, 2, .2)
m3 = pd.np.arange(2, 10, 2)
m4 = pd.np.arange(2, 20, 1)
m5 = pd.np.arange(20, 80, 2)
m6 = pd.np.arange(80, 98, 1)
xTick_minor = pd.np.concatenate((m1,m2,m3,m4, m5, m6))
xTick_major = pd.np.array([.2,.5,1,2,5,10,20,30,40,50,60,70,80,90,95,98])

m1 = range(0, 250, 50)
m2 = range(250, 500, 10)
m3 = range(500, 1000, 20)
m4 = range(1000, 5000, 100)
m5 = range(5000, 10000, 200)
m6 = range(10000, 50000, 1000)

yTick_minor = pd.np.concatenate((m1,m2,m3,m4,m5,m6))
yTick_major = pd.np.array([250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000])

axes.invert_xaxis()
axes.set_ylabel('Discharge in CFS')
axes.set_xlabel('Exceedance Probability')
axes.xaxis.set_major_formatter(FormatStrFormatter('%3.1f'))
axes.set_xticks(xTick_major)
axes.set_xticks(xTick_minor, minor=True)
axes.grid(which='major', alpha=0.7)
axes.grid(which='minor', alpha=0.4)

axes.set_yticks(yTick_major)
axes.set_yticks(yTick_minor, minor=True)

The grid is correct, but what I now want to do is make sure the in the display, the low probability ranges get spaced out more, and the same for the low discharge values (y-axis). Essentially I want to control the spacing between ticks, not the tick interval itself, so that the range from .2 to .5 displays similarly to the range between 40 and 50 on the x-axis, as the desired grid shows.

Can this be done in matplotlib? I have read through the documentation on tick_params and locators, but none of these seem to address this kind of axis formatting.

The y axis is log-scaled. You can achieve this easily by setting `axes.yaxis.set_scale('log')`. The `x-axis' is weird. , I would suggest to either scale them linearly or logarithmic. Everything else is confusing for the reader. — hitzg, Jul 01 '15 at 18:32
@hitzg I have considered using a logy scale, and that is reasonable, though harder to manage the grid and labels. I agree the x-axis is weird. I'm essentially trying to emulate probability paper. I can do that with the code I provided but it doesn't display the way I want it to. — Nelz11, Jul 01 '15 at 18:49
Also, using axes.yaxis.set_scale('log'), still doesn't solve the problem of making the lower ranges visible at the same "scale". You end up with the same result with the grid bunched up at the bottom and spread out at the top. Not quite the desired result. — Nelz11, Jul 01 '15 at 18:51

Amy Teegarden · Answer 1 · 2015-07-03T20:55:16.047

You can define a custom scale for the x-axis, which you can use instead of 'log'. Unfortunately, it's complicated and you'll need to figure out a function that lets you transform the numbers you give for the x-axis into something more linear. See http://matplotlib.org/examples/api/custom_scale_example.html.

Edit to add:

The problem was so interesting I decided to figure out if I could make the custom axis myself. I altered the code from the link to work with your example. I'd be interested to see whether it works the way you want.

Edit: New and improved(?) code! The spacing isn't quite as even as before, but it's now done automatically when you pass a list of points to plt.gca().set_xscale (see near the end of the code for example). It does a curve fit to fit those points to a logistic function and uses the resulting parameters as the basis for the transformation. I get a warning when I run this code (Warning: converting a masked element to nan). I still haven't figured out what's going on there, but it doesn't seem to be causing problems. Here's the figure that I generated:

matplotlib figure with custom axis

import numpy as np
from numpy import ma
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
from matplotlib.ticker import Formatter, FixedLocator
from scipy.optimize import curve_fit

def logistic(x, L, k, x0):
    """Logistic function (s-curve)."""
    return L / (1 + np.exp(-k * (x - x0)))

class ProbabilityScale(mscale.ScaleBase):
    """
    Scales data so that points along a logistic curve become evenly spaced.
    """

    # The scale class must have a member ``name`` that defines the
    # string used to select the scale.  For example,
    # ``gca().set_yscale("probability")`` would be used to select this
    # scale.
    name = 'probability'


    def __init__(self, axis, **kwargs):
        """
        Any keyword arguments passed to ``set_xscale`` and
        ``set_yscale`` will be passed along to the scale's
        constructor.

        lower_bound: Minimum value of x. Defaults to .01.
        upper_bound_dist: L - upper_bound_dist is the maximum value
        of x. Defaults to lower_bound.

        """
        mscale.ScaleBase.__init__(self)
        lower_bound = kwargs.pop("lower_bound", .01)
        if lower_bound <= 0:
            raise ValueError("lower_bound must be greater than 0")
        self.lower_bound = lower_bound
        upper_bound_dist = kwargs.pop("upper_bound_dist", lower_bound)
        self.points = kwargs['points']
        #determine parameters of logistic function with curve fitting
        x = np.linspace(0, 1, len(self.points))
        #initial guess for parameters
        p0 = [max(self.points), 1, .5]
        popt, pcov = curve_fit(logistic, x, self.points, p0 = p0)
        [self.L, self.k, self.x0] = popt
        self.upper_bound = self.L - upper_bound_dist

    def get_transform(self):
        """
        Override this method to return a new instance that does the
        actual transformation of the data.

        The ProbabilityTransform class is defined below as a
        nested class of this one.
        """
        return self.ProbabilityTransform(self.lower_bound, self.upper_bound, self.L, self.k, self.x0)

    def set_default_locators_and_formatters(self, axis):
        """
        Override to set up the locators and formatters to use with the
        scale.  This is only required if the scale requires custom
        locators and formatters.  Writing custom locators and
        formatters is rather outside the scope of this example, but
        there are many helpful examples in ``ticker.py``.
        """


        axis.set_major_locator(FixedLocator(self.points))

    def limit_range_for_scale(self, vmin, vmax, minpos):
        """
        Override to limit the bounds of the axis to the domain of the
        transform.  In this case, the bounds should be
        limited to the threshold that was passed in.  Unlike the
        autoscaling provided by the tick locators, this range limiting
        will always be adhered to, whether the axis range is set
        manually, determined automatically or changed through panning
        and zooming.
        """
        return max(vmin, self.lower_bound), min(vmax, self.upper_bound)

    class ProbabilityTransform(mtransforms.Transform):
        # There are two value members that must be defined.
        # ``input_dims`` and ``output_dims`` specify number of input
        # dimensions and output dimensions to the transformation.
        # These are used by the transformation framework to do some
        # error checking and prevent incompatible transformations from
        # being connected together.  When defining transforms for a
        # scale, which are, by definition, separable and have only one
        # dimension, these members should always be set to 1.
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, lower_bound, upper_bound, L, k, x0):
            mtransforms.Transform.__init__(self)
            self.lower_bound = lower_bound
            self.L = L
            self.k = k
            self.x0 = x0
            self.upper_bound = upper_bound
        def transform_non_affine(self, a):
            """
            This transform takes an Nx1 ``numpy`` array and returns a
            transformed copy.  Since the range of the scale
            is limited by the user-specified threshold, the input
            array must be masked to contain only valid values.
            ``matplotlib`` will handle masked arrays and remove the
            out-of-range data from the plot.  Importantly, the
            ``transform`` method *must* return an array that is the
            same shape as the input array, since these values need to
            remain synchronized with values in the other dimension.
            """
            masked = ma.masked_where((a < self.lower_bound) | (a > self.upper_bound), a)
            return ma.log((self.L - masked) / masked) / -self.k + self.x0

        def inverted(self):
            """
            Override this method so matplotlib knows how to get the
            inverse transform for this transform.
            """
            return ProbabilityScale.InvertedProbabilityTransform(self.lower_bound, self.upper_bound, self.L, self.k, self.x0)

    class InvertedProbabilityTransform(mtransforms.Transform):
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, lower_bound, upper_bound, L, k, x0):
            mtransforms.Transform.__init__(self)
            self.lower_bound = lower_bound
            self.L = L
            self.k = k
            self.x0 = x0
            self.upper_bound = upper_bound

        def transform_non_affine(self, a):
            return self.L / (1 + np.exp(-self.k * (a - self.x0)))
        def inverted(self):
            return ProbabilityScale.ProbabilityTransform(self.lower_bound, self.upper_bound, self.L, self.k, self.x0)

# Now that the Scale class has been defined, it must be registered so
# that ``matplotlib`` can find it.
mscale.register_scale(ProbabilityScale)


if __name__ == '__main__':
    import matplotlib.pyplot as plt
    x = np.linspace(.1, 100, 1000)
    points = np.array([.2,.5,1,2,5,10,20,30,40,50,60,70,80,90,95,98])

    plt.plot(x, x)
    plt.gca().set_xscale('probability', points = points, vmin = .01)
    plt.grid(True)

    plt.show()

Very nice answer. Clear, clean and the proper way of doing it. Only two suggestions to improve it: You could rename the scale classes to something more related and also all the documentation strings (or at least remove them, as they are still about the mercator projection). And you could add an image of the resulting, awesome figure ;) — hitzg, Jul 02 '15 at 06:45
@Amy Teegarden Thanks for taking on that challenge! I've spent two days working on this problem! I came up with a solution, but I think I'll try this custom scale to make it a bit better since it looks like it provides a reasonable probability scale! However, can you clarify a bit on the transformation? I've been toying around with your example, and it doesn't seem to always work well. I'll post my work-around solution in a bit as well, but I would like to understand yours a bit better. — Nelz11, Jul 02 '15 at 15:34
Thanks for the feedback! I'm new here, so it's nice to hear that my answer was helpful. — Amy Teegarden, Jul 02 '15 at 16:28
As for the transform, it was honestly pretty hacky. I plotted the points you used for the x-axis, and the lower end looks exponential and the upper end looks logarithmic, so I applied a log function to the bottom and an exponential function to the top in a piecewise manner, then adjusted to match the slope and y-intercept of the middle, linear part. Perhaps some sort of inverse S-curve would be a better general solution, even if it doesn't match your points exactly. — Amy Teegarden, Jul 02 '15 at 16:37
Thanks for the edit! Looks like we came to about the same conclusion! I was going for the curve fit route till I realized I was simply trying to scale to a cumulative density function. I like your solution though since it seems to give the tails more "weight" so they can be seen better. I posted the "technically" correct solution as well. — Nelz11, Jul 07 '15 at 16:13

score 7 · Accepted Answer · answered Jul 07 '15 at 16:09

I finally came up with the correct solution, thanks to @Amy Teegarden for getting me going in the right direction. I thought I would share the final solution here for others to reference! Here is the final result:

Flood Frequency Curve Probability Plot

Following is a true probability axis scale, using the normal CDF and its inverse, PPF, functions paramterized by mu and sigma.

import numpy as np
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
from matplotlib.ticker import FormatStrFormatter, FixedLocator
from scipy.stats import norm

class ProbScale(mscale.ScaleBase):
    """
    Scales data in range 0 to 100 using a non-standard log transform
    This scale attempts to replicate "probability paper" scaling

    The scale function:
        A piecewise combination of exponential, linear, and logarithmic scales

    The inverse scale function:
      piecewise combination of exponential, linear, and logarithmic scales

    Since probabilities at 0 and 100 are not represented,
    there is user-defined upper and lower limit, above and below which nothing
    will be plotted.  This defaults to .1 and 99 for lower and upper, respectively.

    """

    # The scale class must have a member ``name`` that defines the
    # string used to select the scale.  For example,
    # ``gca().set_yscale("mercator")`` would be used to select this
    # scale.
    name = 'prob_scale'


    def __init__(self, axis, **kwargs):
        """
        Any keyword arguments passed to ``set_xscale`` and
        ``set_yscale`` will be passed along to the scale's
        constructor.

        upper: The probability above which to crop the data.
        lower: The probability below which to crop the data.
        """
        mscale.ScaleBase.__init__(self)
        upper = kwargs.pop("upper", 98) #Default to an upper bound of 98%
        if upper <= 0 or upper >= 100:
            raise ValueError("upper must be between 0 and 100.")
        lower = kwargs.pop("lower", 0.2) #Default to a lower bound of .2%
        if lower <= 0 or lower >= 100:
            raise ValueError("lower must be between 0 and 100.")
        if lower >= upper:
            raise ValueError("lower must be strictly less than upper!.")
        self.lower = lower
        self.upper = upper

        #This scale is best described by the CDF of the normal distribution
        #This distribution is paramaterized by mu and sigma, these default vaules
        #are provided to work generally well, but can be adjusted by the user if desired
        mu = kwargs.pop("mu", 15)
        sigma = kwargs.pop("sigma", 40)
        self.mu = mu
        self.sigma = sigma
        #Need to enfore the upper and lower limits on the axes initially
        axis.axes.set_xlim(lower,upper)

    def get_transform(self):
        """
        Override this method to return a new instance that does the
        actual transformation of the data.

        The ProbTransform class is defined below as a
        nested class of this one.
        """
        return self.ProbTransform(self.lower, self.upper, self.mu, self.sigma)

    def set_default_locators_and_formatters(self, axis):
        """
        Override to set up the locators and formatters to use with the
        scale.  This is only required if the scale requires custom
        locators and formatters.  Writing custom locators and
        formatters: many helpful examples in ``ticker.py``.

        In this case, the prob_scale uses a fixed locator from
        0.1 to 99 % and a custom no formatter class

        This builds both the major and minor locators, and cuts off any values
        above or below the user defined thresholds: upper, lower
        """
        #major_ticks = np.asarray([.2,.5,1,2,5,10,20,30,40,50,60,70,80,90,95,98])
        major_ticks = np.asarray([.2,1,2,5,10,20,30,40,50,60,70,80,90,98]) #removed a couple ticks to make it look nicer
        major_ticks = major_ticks[np.where( (major_ticks >= self.lower) & (major_ticks <= self.upper) )]

        minor_ticks = np.concatenate( [np.arange(.2, 1, .1), np.arange(1, 2, .2), np.arange(2,20,1), np.arange(20, 80, 2), np.arange(80, 98, 1)] )
        minor_ticks = minor_ticks[np.where( (minor_ticks >= self.lower) & (minor_ticks <= self.upper) )]
        axis.set_major_locator(FixedLocator(major_ticks))
        axis.set_minor_locator(FixedLocator(minor_ticks))


    def limit_range_for_scale(self, vmin, vmax, minpos):
        """
        Override to limit the bounds of the axis to the domain of the
        transform.  In the case of Probability, the bounds should be
        limited to the user bounds that were passed in.  Unlike the
        autoscaling provided by the tick locators, this range limiting
        will always be adhered to, whether the axis range is set
        manually, determined automatically or changed through panning
        and zooming.
        """
        return max(self.lower, vmin), min(self.upper, vmax)

    class ProbTransform(mtransforms.Transform):
        # There are two value members that must be defined.
        # ``input_dims`` and ``output_dims`` specify number of input
        # dimensions and output dimensions to the transformation.
        # These are used by the transformation framework to do some
        # error checking and prevent incompatible transformations from
        # being connected together.  When defining transforms for a
        # scale, which are, by definition, separable and have only one
        # dimension, these members should always be set to 1.
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, upper, lower, mu, sigma):
            mtransforms.Transform.__init__(self)
            self.upper = upper
            self.lower = lower
            self.mu = mu
            self.sigma = sigma

        def transform_non_affine(self, a):
            """
            This transform takes an Nx1 ``numpy`` array and returns a
            transformed copy.  Since the range of the Probability scale
            is limited by the user-specified threshold, the input
            array must be masked to contain only valid values.
            ``matplotlib`` will handle masked arrays and remove the
            out-of-range data from the plot.  Importantly, the
            ``transform`` method *must* return an array that is the
            same shape as the input array, since these values need to
            remain synchronized with values in the other dimension.
            """

            masked = np.ma.masked_where( (a < self.upper) & (a > self.lower) , a)
            #Get the CDF of the normal distribution located at mu and scaled by sigma
            #Multiply these by 100 to put it into a percent scale
            cdf = norm.cdf(masked, self.mu, self.sigma)*100
            return cdf

        def inverted(self):
            """
            Override this method so matplotlib knows how to get the
            inverse transform for this transform.
            """
            return ProbScale.InvertedProbTransform(self.lower, self.upper, self.mu, self.sigma)

    class InvertedProbTransform(mtransforms.Transform):
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, lower, upper, mu, sigma):
            mtransforms.Transform.__init__(self)
            self.lower = lower
            self.upper = upper
            self.mu = mu
            self.sigma = sigma

        def transform_non_affine(self, a):
            #Need to get the PPF value for a, which is in a percent scale [0,100], so move back to probability range [0,1]
            inverse = norm.ppf(a/100, self.mu, self.sigma)
            return inverse

        def inverted(self):
            return ProbScale.ProbTransform(self.lower, self.upper)

# Now that the Scale class has been defined, it must be registered so
# that ``matplotlib`` can find it.
mscale.register_scale(ProbScale)

Also, to get the desired y-axis results, I discovered that the log-scale could indeed be used with a few extra tweaks to get a reasonable graph. This is the code used to force the log scale to have appropriate minor ticks:

axes.set_yscale('log', basey=10, subsy=[2,3,4,5,6,7,8,9])

Then you can modify the labeling with the locator and formatter:

#Adjust the yaxis labels and format
axes.yaxis.set_minor_locator(FixedLocator([200, 500, 1500, 2500, 3500, 4500, 5000, 6000, 7000, 8000, 9000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000]))
axes.yaxis.set_minor_formatter(FormatStrFormatter('%d'))
axes.yaxis.set_major_formatter(FormatStrFormatter('%d'))

So the complete axes maninputlation looks like this:

axes.set_ylabel('Discharge in CFS')
axes.set_xlabel('Exceedance Probability')
plt.setp(plt.xticks()[1], rotation=45)
#Adjust the scales of the x and y axis
axes.set_yscale('log', basey=10, subsy=[2,3,4,5,6,7,8,9])
axes.set_xscale('prob_scale', upper=98, lower=.2)
#Adjust the yaxis labels and format
axes.yaxis.set_minor_locator(FixedLocator([200, 500, 1500, 2500, 3500, 4500, 5000, 6000, 7000, 8000, 9000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000]))
axes.yaxis.set_minor_formatter(FormatStrFormatter('%d'))
axes.yaxis.set_major_formatter(FormatStrFormatter('%d'))

#Finally set the y-limit of the plot to be reasonable
axes.set_ylim((0, 2*pp['Q'].max()))
#Invert the x-axis
axes.invert_xaxis()
#Turn on major and minor grid lines
axes.grid(which='both', alpha=.9)

This provides a semi-log scale probability paper plot! With the nice property that anything plotted on these axes that plots in a straight line indicates that it comes from a normal distribution!

In `class ProbTransform`, method `transform_non_affine`, rather than a CDF, if should be the PPF (and use 0-1 scale). Just the opposite for the Inverted, where PPF should actually be CDF. This then creates the correct normal scaling as shown in the original question sample image. Nice work. — Q-man, Apr 13 '16 at 19:11
The quickest explanation is to compare the plot you show, with the original scanned one you are trying to mimic. Consider the width of the bin from 40-30% in both figures, compared to the width of 90-80%. In yours, the 40-30 is larger, whereas in the original, 40-30% is smaller compared to 90-80%. We are interested in 'compressing' the middle of the distribution and stretching the shoulders, not the opposite as you've shown. — Q-man, May 05 '16 at 16:33

Creating Probability/Frequency Axis Grid (Irregularly Spaced) with Matplotlib

2 Answers2

Linked