Plotting log-binned network degree distributions

Question

I have often encountered and made long-tailed degree distributions/histograms from complex networks like the figures below. They make the heavy end of these tails, well, very heavy and crowded from many observations:

Classic long-tailed degree distribution

However, many publications I read have much cleaner degree distributions that don't have this clumpiness at the end of the distribution and the observations are more evenly-spaced.

! Classic long-tailed degree distribution

How do you make a chart like this using NetworkX and matplotlib?

What exactly is the question here? It looks like you've already achieved the result you are looking for. You'll need to be more specific than "make it better". — Hooked, May 10 '13 at 20:30
There's no question, just sharing how I solved a problem and opening it up to others' feedback if I've missed something in my approach. — Brian Keegan, May 10 '13 at 20:32
The better way to do this, otherwise it will get closed, is to break this up into a question and answer it yourself. See http://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/ — Hooked, May 10 '13 at 20:34
In this case you'll get feedback in the comments to the answer where they belong, as it stands now this question should be closed - but fix it since you've posted a lot of good information! — Hooked, May 10 '13 at 20:35

Brian Keegan · Accepted Answer · 2013-05-14T22:01:57.033

Use log binning (see also). Here is code to take a Counter object representing a histogram of degree values and log-bin the distribution to produce a sparser and smoother distribution.

import numpy as np
def drop_zeros(a_list):
    return [i for i in a_list if i>0]

def log_binning(counter_dict,bin_count=35):

    max_x = log10(max(counter_dict.keys()))
    max_y = log10(max(counter_dict.values()))
    max_base = max([max_x,max_y])

    min_x = log10(min(drop_zeros(counter_dict.keys())))

    bins = np.logspace(min_x,max_base,num=bin_count)

    # Based off of: http://stackoverflow.com/questions/6163334/binning-data-in-python-with-scipy-numpy
    bin_means_y = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.values())[0] / np.histogram(counter_dict.keys(),bins)[0])
    bin_means_x = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.keys())[0] / np.histogram(counter_dict.keys(),bins)[0])

    return bin_means_x,bin_means_y

Generating a classic scale-free network in NetworkX and then plotting this:

import networkx as nx
ba_g = nx.barabasi_albert_graph(10000,2)
ba_c = nx.degree_centrality(ba_g)
# To convert normalized degrees to raw degrees
#ba_c = {k:int(v*(len(ba_g)-1)) for k,v in ba_c.iteritems()}
ba_c2 = dict(Counter(ba_c.values()))

ba_x,ba_y = log_binning(ba_c2,50)

plt.xscale('log')
plt.yscale('log')
plt.scatter(ba_x,ba_y,c='r',marker='s',s=50)
plt.scatter(ba_c2.keys(),ba_c2.values(),c='b',marker='x')
plt.xlim((1e-4,1e-1))
plt.ylim((.9,1e4))
plt.xlabel('Connections (normalized)')
plt.ylabel('Frequency')
plt.show()

Produces the following plot showing the overlap between the "raw" distribution in blue and the "binned" distribution in red.

Comparison between raw and log-binned

Thoughts on how to improve this approach or feedback if I've missed something obvious are welcome.

The x-y labels are: x axis -> the log of the degrees encountered in the network; y axis -> the log of the frequency of those degrees. — FaCoffee, Nov 28 '16 at 15:38
Note - in many places `counter_dict.keys()` should be replaced by `list(counter_dict.keys())` for newer versions of python (for which `dict.keys()` is not a list) — Joel, Sep 11 '18 at 11:10
Does it ever make sense to plot a degree distribution type plot like you have shown on data that isn't network data? That is, to use this plot over a histogram? Say on skewed blood pressure data or counts of something? thanks — user63230, Oct 31 '19 at 09:21

Plotting log-binned network degree distributions

1 Answers1

Linked