16

I have a set of randomly generated formal graphs, and I would like to calculate the entropy of each one. The same question in different words: I have several networks, and want to calculate the information content of each one.

Here are two sources containing formal definitions of graph entropy:
http://www.cs.washington.edu/homes/anuprao/pubs/CSE533Autumn2010/lecture4.pdf (PDF) http://arxiv.org/abs/0711.4175v1

The code I am looking for takes a graph as input (as either an edge list or an adjacency matrix) and outputs a number of bits or some other measure of information content.

Because I can't find an implementation of this anywhere, I am setting out to code this from scratch based on the formal definitions. If anyone has already solved this problem and is willing to share the code, it would be wildly appreciated.

3 Answers3

7

I ended up using different papers for definitions of graph entropy:
Information Theory of Complex Networks: On Evolution and Architectural Constraints
R.V. Sole and S. Valverde (2004)
and
Network Entropy Based on Topology Configuration and Its Computation to Random Networks
B.H. Wang, W.X. Wang and T. Zhou

The code to calculate each is below. The code assumes you have an undirected, unweighted graph with no self-loops. It takes an adjacency matrix as input and returns the amount of entropy in bits. It is implemented in R and makes use of the sna package.

graphEntropy <- function(adj, type="SoleValverde") {
  if (type == "SoleValverde") {
    return(graphEntropySoleValverde(adj))
  }
  else {
    return(graphEntropyWang(adj))
  }
}

graphEntropySoleValverde <- function(adj) {
  # Calculate Sole & Valverde, 2004 graph entropy
  # Uses Equations 1 and 4
  # First we need the denominator of q(k)
  # To get it we need the probability of each degree
  # First get the number of nodes with each degree
  existingDegrees = degree(adj)/2
  maxDegree = nrow(adj) - 1
  allDegrees = 0:maxDegree
  degreeDist = matrix(0, 3, length(allDegrees)+1) # Need an extra zero prob degree for later calculations
  degreeDist[1,] = 0:(maxDegree+1)
  for(aDegree in allDegrees) {
    degreeDist[2,aDegree+1] = sum(existingDegrees == aDegree)
  }
  # Calculate probability of each degree
  for(aDegree in allDegrees) {
    degreeDist[3,aDegree+1] = degreeDist[2,aDegree+1]/sum(degreeDist[2,])
  }
  # Sum of all degrees mult by their probability
  sumkPk = 0
  for(aDegree in allDegrees) {
    sumkPk = sumkPk + degreeDist[2,aDegree+1] * degreeDist[3,aDegree+1]
  }
  # Equivalent is sum(degreeDist[2,] * degreeDist[3,])
  # Now we have all the pieces we need to calculate graph entropy
  graphEntropy = 0
  for(aDegree in 1:maxDegree) {
    q.of.k = ((aDegree + 1)*degreeDist[3,aDegree+2])/sumkPk
    # 0 log2(0) is defined as zero
    if (q.of.k != 0) {
      graphEntropy = graphEntropy + -1 * q.of.k * log2(q.of.k)
    }
  }
  return(graphEntropy)
}

graphEntropyWang <- function(adj) {
  # Calculate Wang, 2008 graph entropy
  # Uses Equation 14
  # bigN is simply the number of nodes
  # littleP is the link probability.  That is the same as graph density calculated by sna with gden().
  bigN = nrow(adj)
  littleP = gden(adj)
  graphEntropy = 0
  if (littleP != 1 && littleP != 0) {
    graphEntropy = -1 * .5 * bigN * (bigN - 1) * (littleP * log2(littleP) + (1-littleP) * log2(1-littleP))
  }
  return(graphEntropy)
}
  • 6
    By the way, once I implemented these functions and calculated entropies for real graphs, I was disappointed with these measures. The Wang measure depends only on the size of the graph and the density, and doesn't take into account the structure of the graph at all. It is mostly a measure of density. The Sole measure reflects the diversity of remaining degree counts among nodes. It is more a measure of assortativity than anything else. I am still at a loss for quantifying how complex or not a graph is. – shotgun_approach Aug 10 '11 at 02:46
1

You can use Koerner's entropy (= Shannon entropy applied to a graph). A good reference for the literature is here. Note however that the computation is in general NP-hard (for the stupid reason that you need to search of the all subsets of vertices).

dohmatob
  • 289
  • 4
  • 14
1

If you have a weighted graph a good start would be to sort and count all the weights. Then you can use the formula -log(p)+log(2) (http://en.wikipedia.org/wiki/Binary_entropy_function) to determine the amount of bits to be needed for the code. Maybe this doesn't work because it's the binary entropy function?

Micromega
  • 12,486
  • 7
  • 35
  • 72