Matplotlib logarithmic scale with zero value

Question

I have a very large and sparse dataset of spam twitter accounts and it requires me to scale the x axis in order to be able to visualize the distribution (histogram, kde etc) and cdf of the various variables (tweets_count, number of followers/following etc).

    > describe(spammers_class1$tweets_count)
  var       n   mean      sd median trimmed mad min    max  range  skew kurtosis   se
1   1 1076817 443.47 3729.05     35   57.29  43   0 669873 669873 53.23  5974.73 3.59

In this dataset, the value 0 has a huge importance (actually 0 should have the highest density). However, with a logarithmic scale these values are ignored. I thought of changing the value to 0.1 for example, but it will not make sense that there are spam accounts that have 10^-1 followers.

So, what would be a workaround in python and matplotlib ?

it would be nice if you put your axes/plot code so as to be corrected. — Stephane Rolland, May 05 '13 at 09:29
use `symlog` http://stackoverflow.com/questions/3305865/what-is-the-difference-between-log-and-symlog — tacaswell, Aug 04 '13 at 06:20

unutbu · Accepted Answer · 2013-05-05T10:25:27.167

2

Add 1 to each x value, then take the log:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker

fig, ax = plt.subplots()
x = [0, 10, 100, 1000]
y = [100, 20, 10, 50]
x = np.asarray(x) + 1 
y = np.asarray(y)
ax.plot(x, y)
ax.set_xscale('log')
ax.set_xlim(x.min(), x.max())
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))
plt.show()

enter image description here

Use

ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))

to relabel the tick marks according to the non-log values of x.

(My original suggestion was to use plt.xticks(x, x-1), but this would affect all axes. To isolate the changes to one particular axes, I changed all commands calls to ax, rather than calls to plt.)

matplotlib removes points which contain a NaN, inf or -inf value. Since log(0) is -inf, the point corresponding to x=0 would be removed from a log plot.

If you increase all the x-values by 1, since log(1) = 0, the point corresponding to x=0 will not be plotted at x=log(1)=0 on the log plot.

The remaining x-values will also be shifted by one, but it will not matter to the eye since log(x+1) is very close to log(x) for large values of x.

edited May 05 '13 at 10:25

answered May 05 '13 at 09:35

unutbu

842,883
184
1,785
1,677

yes, but I will not be able to say in my paper that 50% of spammers have 0 followers. because it will be shown as 10^0 and this will mean that they have one follower (which is different). – amaatouq May 05 '13 at 09:43
You could relabel the tick marks with `plt.xticks`. I've edited the post to show how. – unutbu May 05 '13 at 09:50
In order not to shift all of the data. How can I efficiently add 0.1 to 0 values, so they will come up at the 10^-1 and then relabel the ticks ? I know this is another question. but It might be a better way of doing it without contaminating all of the data -shifting only 0 values- (and looping over large numpy arrays is very slow) – amaatouq May 05 '13 at 10:04
1

If you have an array with many 0 values, you can change them to 0.1 with `x[x<=0] = 0.1`. Note that if the array is of dtype `int`, then you must first convert the array to dtype `float`: `x = x.astype('float')`. – unutbu May 05 '13 at 10:17
1

I protest in the strongest terms to modifying data before plotting it. – tacaswell Aug 04 '13 at 06:19
@tcaswell: Please reread my answer. Exactly what are you objecting against? – unutbu Aug 04 '13 at 10:04
@tcaswell. I'm very confused that you can object to modifying data, and yet recommend using `symlog`. `symlog` is plotting part of the data on a linear scale, and part on a logarithmic scale. If that isn't a modification of data before plotting, what is? – unutbu Aug 04 '13 at 10:22
Using creative scales bothers me less than modifying the data, it is much harder to hide. I am also reacting to your comments explaining how to add 0.1 to _just_ the zero entries. – tacaswell Aug 04 '13 at 15:20
Ordinarily, I would agree with you that *selectively* modifying data is a terrible idea. With continuous data, mapping 0 to 0.1 would be disasterous if there is already data at 0.1. However, if you read his question carefully he is dealing with integer-valued data. So mapping 0 to 0.1 and then plotting it at on a log scale simply maps the data at 0 to -1, to be adjacent to the data that was at 1, which is mapped to 0. In this context -- for integer data -- I don't think this solution is terrible. – unutbu Aug 04 '13 at 18:48
In fact, it is **equivalent** to the symlog solution you suggest -- it produces a graph with two scales. Personally, I believe `log(x+1)` is a cognitively simpler answer since `log(x+1)` is smooth, and does not require one to adjust for two scales on one axis. `symlog` also does not highlight where the change of scale occurs. – unutbu Aug 04 '13 at 18:49

score 0 · Answer 2 · answered May 05 '13 at 09:25

0

ax1.set_xlim(0, 1e3)

Here is the example from matplotlib documentation.

And there it sets the limit values of the axes this way:

ax1.set_xlim(1e1, 1e3)
ax1.set_ylim(1e2, 1e3)

answered May 05 '13 at 09:25

Stephane Rolland

38,876
35
121
169

4

This doesn't show how to go with zero values on the logarithmic scale. as log(0) is undefined so matplotlib will ignore these values.Setting the xlim to 1e1 will make the x axis start from 0.1 and still would ignore 0 (I believe). I'll try it out anyway – amaatouq May 05 '13 at 09:41
at least as of july 2015, matplotlib is not ignoring zeros, it draws a straight line on the log plot all the way to the edge of the plot, which looks terrible and doesn't match matlab. hayer's comment doesn't seem true to me. – poleguy Jul 17 '15 at 14:36

Matplotlib logarithmic scale with zero value

2 Answers2

Linked