0

I'd like to create a bar-chart, where the X axis would include hundred of thousands of data points.

Thus, I need to employ the logarithmic scale. Alas, X == 0 is a valid data-point.
BTW, the Y axis should employ the linear scale (where y are distributions, 0 < Y <= 1).

Following is minimal demonstration code:

$ cat stack_example.py 
#!/usr/bin/env python

def test_plot3():
    import pylab as pl

    _graph = {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
    epsilon = 0.00000000001
    x = [ pl.log(k) if k > 0 else pl.log(epsilon) for k in _graph ]
    y = [ _graph[k] for k in _graph ]
    lx = pl.xlabel("in degree (logarithmic scale)")
    ly = pl.ylabel("normalized distribution (0 to 1)")
    tl = pl.title("graph in-degree normalized distribution")
    _width = 1.0 / (len(x) * 5.0)
    pl.bar(x, y, width=_width, log=True)
    pl.xscale('log')
    pl.yscale('linear')
    pl.show()

if __name__ == "__main__":
    test_plot3()

Which produced the following invalid graph (the large blue rectangle on the left seems to be a bug):

semi-log bar chart

Can you suggest a way to produce a correct bar-chart plot from Python, that will employ the logarithmic scale on the X axis, and the Linear scale on the Y axis, and will accept 0 as a valid x point?

EDIT 1

Based on @Ed's comment, I amended my code to:

#!/usr/bin/env python

def test_plot3():
    import pylab as pl

    _graph = {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
    epsilon = 0.1
    x = [ pl.log(k) if k > 0 else pl.log(epsilon) for k in _graph ]
    y = [ _graph[k] for k in _graph ]
    lx = pl.xlabel("in degree (logarithmic scale)")
    ly = pl.ylabel("normalized distribution (0 to 1)")
    tl = pl.title("graph in-degree normalized distribution")
    _width = 1.0 / (len(x) * 5.0)
    pl.bar(x, y, width=_width, color="blue", log=True)
    pl.xscale('symlog', linthreshx=2)
    pl.yscale('linear')
    pl.show()

if __name__ == "__main__":
    test_plot3()


    if __name__ == "__main__":
        test_plot3()

but the resulting graph still doesn't seem right:

amended graph

boardrider
  • 5,882
  • 7
  • 49
  • 86
  • You can choose one: have 0 on a log scaled axis, or get a *correct* graph. You can't have both. – hitzg Jun 06 '15 at 20:15

1 Answers1

4

You can use symlog instead of log, which includes negaive numbers and a small linear region near zero. For your example,

#!/usr/bin/env python

def test_plot3():
    import pylab as pl

    _graph = {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
    epsilon = 0.00000000001
    x = [ pl.log(k) if k > 0 else pl.log(epsilon) for k in _graph ]
    y = [ _graph[k] for k in _graph ]
    lx = pl.xlabel("in degree (logarithmic scale)")
    ly = pl.ylabel("normalized distribution (0 to 1)")
    tl = pl.title("graph in-degree normalized distribution")
    _width = 1.0 / (len(x) * 5.0)
    pl.bar(x, y, width=_width, log=True)
    pl.xscale('symlog')
    pl.yscale('linear')
    pl.show()

if __name__ == "__main__":
    test_plot3()

You can tune the size of the linear region with linthreshx argument to xscale. Check out this question for details on how to use it.

Community
  • 1
  • 1
Ed Smith
  • 12,716
  • 2
  • 43
  • 55
  • Thanks for the suggestions. They did improve the situation somewhat, but the output graph still seems wrong. See `EDIT 1` in my OP. – boardrider Jun 07 '15 at 14:46
  • @boardrider When you use the log scaled axis (or symlog) you don' thave to apply the log function yourself. Just pass the values, matplotlib will apply the log automatically. So basically using: `x = [k if k > 0 else epsilon for k in _graph]` should solve your issues. And BTW: `log` is the natural log (with base *e*), you would have to use `log10` – hitzg Jun 08 '15 at 10:36
  • In fact you might want to use: `x = [k if abs(k) > 0 else epsilon for k in _graph]` (in case you have negative numbers as well) – hitzg Jun 08 '15 at 10:47