0

I would like to know the best way to identify the nodes(j,k) that influence the betweenness centrality of a focal node i. Since betweenness is based on the geodesics between j and k that pass through i, one way to do this is to get all of the shortest paths in the network, select only those paths that include i, and obtain the ids of nodes on those paths. However

library(igraph)    
test <- shortest_paths(sample_gnp(200,.001),from=1)

takes almost nine seconds to run, and that's just for the paths from the first node; as I understand it, shortest_paths will only calculate paths from one node at a time. This involves a lot of irrelevant information, since I really only want the geodesics that go through i. I assume that the betweenness function itself is doing this in some more efficient way using breadth first search or some other algorithm, but I don't see anything in the documentation for igraph or sna that would allow one to do this. I would like to avoid re-inventing the wheel here.

In short, given a network and a focal node i, what is the best way to find all the geodesics between j and k that include i?

UPDATE: here is a function I wrote that seems to do the trick (I have NOT carefully checked this yet though). It is slow but not unworkably slow on a random graph of 100 nodes with tie probability .2. I am sure there are better ways to do this.

find.shortest.paths <- function(net,focus){
  nodes.on.path <- vector(mode="numeric")
  n <- length(V(net))
  for(i in 1:n){
    for(j in 1:n){
      if(i !=  focus & j != focus){
        p <- all_shortest_paths(net,i,j)
        if(focus %in% (unlist(p$res))==T){
          nodes.on.path <- unique(c(nodes.on.path,unlist(p$res)))
        }
      }
    }
  }  
  return(sort(nodes.on.path))
}

ex <- sample_gnp(100,.2)
#try e.g. node 14 as focal node
system.time(test <- find.shortest.paths(ex,focus=14))
tvg
  • 388
  • 2
  • 13
  • I ran your code (more than once). It took less than a second on my notebook. Any clue, why it took so long at yours? In addition, the network you're creating has (almost) no edges. Why did you choose such a low probability? The chance of having any connected component is small, too. – Robert Kirsten Nov 25 '15 at 15:22
  • FWIW, the betweenness function in igraph is based on [this algorithm](http://algo.uni-konstanz.de/publications/b-fabc-01.pdf), and indeed it avoids finding all the shortest paths. But I'm not sure whether it's usable for your goal or not – Tamás Nov 25 '15 at 15:54
  • I'm not sure about the time difference; I am running a reasonably fast computer. My application involves a dozen networks of size varying between ~150 and ~300 nodes, and I noticed that just calculating shortest paths from one node seemed slow; the random network may not capture the problem. Given my data, looping over all nodes in multiple networks seems doable, but inefficient. I know this is a well-studied problem, so I am trying to figure out if there is an "off the shelf" solution before, as I say, I try to re-invent the wheel. – tvg Nov 25 '15 at 16:43
  • Have you tried [this](http://stackoverflow.com/questions/21718078/betweenness-centrality-for-relatively-large-scale-data) – Robert Kirsten Nov 26 '15 at 17:58
  • thanks for the reference. the function above, though clunky, got me enough of an answer so in my case I don't need to resort to other software or estimating betweenness – tvg Nov 27 '15 at 17:40

0 Answers0