9

I'm working on graph analysis. I want to compute an N by N similarity matrix that contains the Adamic Adar similarity between every two vertices. To give an overview of Adamic Adar let me start with this introduction:

Given the adjacency matrix A of an undirected graph G. CN is the set of all common neighbors of two vertices x, y. A common neighbor of two vertices is one where both vertices have an edge/link to, i.e. both vertices will have a 1 for the corresponding common neighbor node in A. k_n is the degree of node n.

Adamic-Adar is defined as the following: enter image description here

My attempt to compute it is to fetch both rows of the x and y nodes from A and then sum them. Then look for the elements that has 2 as the value and then gets their degrees and apply the equation. However computing that takes really really a long of time. I tried with a graph that contains 1032 vertices and it took a lot of time to compute. It started with 7 minutes and then I cancelled the computations. So my question: is there a better algorithm to compute it?

Here's my code in python:

def aa(graph):

"""
    Calculates the Adamic-Adar index.

"""
N = graph.num_vertices()
A = gts.adjacency(graph)
S = np.zeros((N,N))
degrees = get_degrees_dic(graph)
for i in xrange(N):
    A_i = A[i]
    for j in xrange(N):
        if j != i:
            A_j = A[j]
            intersection = A_i + A_j
            common_ns_degs = list()
            for index in xrange(N):
                if intersection[index] == 2:
                    cn_deg = degrees[index]
                    common_ns_degs.append(1.0/np.log10(cn_deg))
            S[i,j] = np.sum(common_ns_degs)
return S 
Jack Twain
  • 6,273
  • 15
  • 67
  • 107
  • You could save some computation by not building common_ns_degs, instead adding -log10(cn_deg) to S[i,j] which you initialise to zero where you call list() now. BTW it should be log10(1.0/cn_deg), not 1.0/log10(cn_deg). – dmuir Mar 21 '14 at 17:56
  • The formula for the Adamic-Adar index is slightly different to the one provided. It is, sum(1/log(k_n)) for common neighbors k_n. The code appears to be correct – Papples Jul 07 '16 at 14:28

4 Answers4

3

Since you're using numpy, you can really cut down on your need to iterate for every operation in the algorithm. my numpy- and vectorized-fu aren't the greatest, but the below runs in around 2.5s on a graph with ~13,000 nodes:

def adar_adamic(adj_mat):    
    """Computes Adar-Adamic similarity matrix for an adjacency matrix"""

    Adar_Adamic = np.zeros(adj_mat.shape)
    for i in adj_mat:
        AdjList = i.nonzero()[0] #column indices with nonzero values
        k_deg = len(AdjList)
        d = np.log(1.0/k_deg) # row i's AA score

        #add i's score to the neighbor's entry
        for i in xrange(len(AdjList)):
            for j in xrange(len(AdjList)):
                if AdjList[i] != AdjList[j]:
                    cell = (AdjList[i],AdjList[j])
                    Adar_Adamic[cell] = Adar_Adamic[cell] + d

    return Adar_Adamic

unlike MBo's answer, this does build the full, symmetric matrix, but the inefficiency (for me) was tolerable, given the execution time.

Mike
  • 397
  • 5
  • 19
2

I believe you are using rather slow approach. It would better to revert it -
- initialize AA (Adamic-Adar) matrix by zeros
- for every node k get it's degree k_deg
- calc d = log(1.0/k_deg) (why log10 - is it important or not?)
- add d to all AAij, where i,j - all pairs of 1s in kth row of adjacency matrix
Edit:
- for sparse graphs it is useful to extract positions of all 1s in kth row to the list to reach O(V*(V+E)) complexity instead of O(V^3)

AA = np.zeros((N,N))
for k = 0 to N - 1 do
    AdjList = []
    for j = 0 to N - 1 do
        if A[k, j] = 1 then
            AdjList.Add(j)
    k_deg = AdjList.Length
    d = log(1/k_deg)
    for j = 0 to AdjList.Length - 2 do
      for i = j+1 to AdjList.Length - 1 do
         AA[AdjList[i],AdjList[j]] = AA[AdjList[i],AdjList[j]] + d  
         //half of matrix filled, it is symmetric for undirected graph
MBo
  • 77,366
  • 5
  • 53
  • 86
1

I don't see a way of reducing the time complexity, but it can be vectorized:

degrees = A.sum(axis=0)
weights = np.log10(1.0/degrees)
adamic_adar = (A*weights).dot(A.T)

With A a regular Numpy array. It seems you're using graph_tool.spectral.adjacency and thus A would be a sparse matrix. In that case the code would be:

from scipy.sparse import csr_matrix

degrees = A.sum(axis=0)
weights = csr_matrix(np.log10(1.0/degrees))
adamic_adar = A.multiply(weights) * A.T

This is much faster than using Python loops. A small warning though: with this approach you really need to make sure that the values on the main diagonal (of A and adamic_adar) are what you expect them to be. Also, A must not contain weights, but only zeros and ones.

  • I really like this approach. If you wanted to apply it to the correct formula, as mentioned [above](http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar#comment63917506_22565620). You could change it to: `from scipy.sparse import csr_matrix` `A = adjacency(graph)` `degrees = A.sum(axis=0)` `degrees[degrees == 1] = 0 #nodes with degree one cause errors and are not intermediate nodes` `weights = csr_matrix(1./np.log(degrees))` `adamic_adar = A.multiply(weights) * A.T` – Papples Jul 07 '16 at 14:55
1

I believe there most be a function like the one defined in R igraph in its python_igraph as well for the node similarity (Adamic_Adar as well)

academic.user
  • 639
  • 2
  • 9
  • 28