3

I want to plot a cdf of data on a lognormal probability graph, like shown below:

enter image description here

I want the axes scales on my plot to look like that, only flipped (with probability on the x-axis). Note that the y-axis above is NOT simply a logarithmic scale. Also I'm not sure why the x-axis above repeats 1-9 instead of going to 10-99 etc, but ignore that part.

Here is what I have so far. I am using the method to make a CDF as outlined here

mu, sigma = 3., 1. # mean and standard deviation
data = np.random.lognormal(mu, sigma, 1000)

#Make CDF
dataSorted = np.sort(data)
dataCdf = np.linspace(0,1,len(dataSorted))

plt.plot(dataCdf, dataSorted)
plt.gca().set_yscale('log')
plt.xlabel('probability')
plt.ylabel('value')

enter image description here

Now I just need a way to scale my x-axis like the y-axis is on the picture above.

Community
  • 1
  • 1
hm8
  • 1,381
  • 3
  • 21
  • 41
  • 1
    This is what you need: [Plot logarithmic axes with matplotlib in python](http://stackoverflow.com/questions/773814/plot-logarithmic-axes-with-matplotlib-in-python)? – Lucas Jan 12 '17 at 20:58
  • 1
    Isn't it obvious how you can make the x axis logarithmic from your current code? `plt.gca().set_yscale('log')` -> `plt.gca().set_xscale('log')` – Chris Mueller Jan 12 '17 at 21:00
  • The x-scale (or y-scale in the example axes) isn't logarithmic. I changed the example axes image for clarity. "Middle" probability values are close to each other, large/small ones are further apart. Its like its logarithmic up to 0.5 and "inversely" logarithmic from 0.5 to 1 – hm8 Jan 12 '17 at 21:03

2 Answers2

2

A way to tackle this problem is to use a symmetric log scale, called symlog.

Symlog is a logarithmic plot that behaves linearly within some range around 0 (where a normal log plot would show infinitively many decades) such that a logarithmic graph crossing 0 is actually possible.

Symlog can be set in matplotlib using ax.set_xscale('symlog', linthreshx=0.1) where linthreshx denotes the linear range around zero.

As in this case we want the center of the graph to be at 0.5 instead of 0, we can actually plot two graphs and stick them together. In order to get the desired result, one can now play with the tickmarks to be shown, as well as the linthreshx paramter. Below is an example.

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker
mu, sigma = 3., 1. # mean and standard deviation
data = np.random.lognormal(mu, sigma, 1000)

#Make CDF
dataSorted = np.sort(data)
dataCdf = np.linspace(0,1,len(dataSorted))

fig, (ax1, ax2) = plt.subplots(ncols=2, sharey=True)
plt.subplots_adjust(wspace=0.00005)
ax1.plot(dataCdf[:len(dataCdf)/2], dataSorted[:len(dataCdf)/2])
ax2.plot(dataCdf[len(dataCdf)/2:]-1, dataSorted[len(dataCdf)/2:])

ax1.set_yscale('log')
ax2.set_yscale('log')

ax1.set_xscale('symlog', linthreshx=0.001)
ax2.set_xscale('symlog', linthreshx=0.001)

ax1.set_xlim([0.01, 0.5])
ax2.set_xlim([-0.5, -0.01])

ticks = np.array([0.01,0.1,  0.3])
ticks2 = ((1-ticks)[::-1])-1
ax1.set_xticks(ticks)
ax1.xaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax2.set_xticks(ticks2)
ax2.xaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax2.set_xticklabels(ticks2+1)

ax1.spines["right"].set_visible(False)
ax2.spines["left"].set_visible(False)
ax1.yaxis.set_ticks_position('left')
ax2.yaxis.set_ticks_position('right')

ax1.set_xlabel('probability')
ax1.set_ylabel('value')

plt.savefig(__file__+".png")
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • As it turns out this is not actually what I am looking for. A lognormal distribution should appear as a perfectly straight line on the graph. My interpretation of 'logarithmic up to 0.5 and "inversely" logarithmic from 0.5 to 1' must be incorrect. – hm8 Jan 24 '17 at 20:59
0

I know it's a bit late but I had a similar problem and solved it, so I thought to share the solution following the custom scale example of the matplotlib docs:

import numpy as np
import scipy.stats as stats
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
from matplotlib.ticker import Formatter, FixedLocator

class PPFScale(mscale.ScaleBase):
    name = 'ppf'

    def __init__(self, axis, **kwargs):
        mscale.ScaleBase.__init__(self)

    def get_transform(self):
        return self.PPFTransform()

    def set_default_locators_and_formatters(self, axis):
        class VarFormatter(Formatter):
            def __call__(self, x, pos=None):
                return f'{x}'[1:]

        axis.set_major_locator(FixedLocator(np.array([.001,.01,.1,.2,.3,.4,.5,.6,.7,.8,.9,.99,.999])))
        axis.set_major_formatter(VarFormatter())


    def limit_range_for_scale(self, vmin, vmax, minpos):
        return max(vmin, 1e-6), min(vmax, 1-1e-6)

    class PPFTransform(mtransforms.Transform):
        input_dims = output_dims = 1

        def ___init__(self, thresh):
            mtransforms.Transform.__init__(self)

        def transform_non_affine(self, a):
            return stats.norm.ppf(a)

        def inverted(self):
            return PPFScale.IPPFTransform()

    class IPPFTransform(mtransforms.Transform):
        input_dims = output_dims = 1

        def transform_non_affine(self, a):
            return stats.norm.cdf(a)

        def inverted(self):
            return PPFScale.PPFTransform()

mscale.register_scale(PPFScale)


if __name__ == '__main__':
    import matplotlib.pyplot as plt
    mu, sigma = 3., 1. # mean and standard deviation
    data = np.random.lognormal(mu, sigma, 10000)

    #Make CDF
    dataSorted = np.sort(data)
    dataCdf = np.linspace(0,1,len(dataSorted))

    plt.plot(dataCdf, dataSorted)
    plt.gca().set_xscale('ppf')
    plt.gca().set_yscale('log')
    plt.xlabel('probability')
    plt.ylabel('value')
    plt.xlim(0.001,0.999)
    plt.grid()
    plt.show()

output[2]

You may also like to have a look at my lognorm demo.

Stef
  • 28,728
  • 2
  • 24
  • 52