I have a very large and sparse dataset of spam twitter accounts and it requires me to scale the x axis in order to be able to visualize the distribution (histogram, kde etc) and cdf of the various variables (tweets_count, number of followers/following etc).
> describe(spammers_class1$tweets_count)
var n mean sd median trimmed mad min max range skew kurtosis se
1 1 1076817 443.47 3729.05 35 57.29 43 0 669873 669873 53.23 5974.73 3.59
In this dataset, the value 0 has a huge importance (actually 0 should have the highest density). However, with a logarithmic scale these values are ignored. I thought of changing the value to 0.1 for example, but it will not make sense that there are spam accounts that have 10^-1 followers.
So, what would be a workaround in python and matplotlib ?