-3

I'm so confused by graphs and adjacency matrix.I have a graph with large numbers of nodes and edges (for example 5000 vertexes and 6000 edges ).I have to give score (with jaccard algorithm) to each pair-nodes that aren't adjacent.I work with gephi java doc.I score to each pair-nodes with jaccard. How do I find top n edge score from adjacency matrix in fastest time?

EDIT

  ArrayList<ArrayList<Double>> score = new ArrayList<ArrayList<Double>>();
    Node[] nodes = graph.getNodes().toArray();
    Jaccard jaccard= new Jaccard();

    for(Node f:nodes){
        for(Node g:nodes){
            if(!graph.isAdjacent(g, f) && g!=f ){
                score.get(f.getId()).set(g.getId(), jaccard.getScore(f, g));
            }else {
                score.get(f.getId()).set(g.getId(), 0.0);
            }
        }
    }
user4157124
  • 2,809
  • 13
  • 27
  • 42
sam
  • 7
  • 2

2 Answers2

0

You need a PriorityQueue.

Algorithm: Set the capacity of the PQ to n. Keep adding objects to it (add your edges that is, which should also tell about the nodes) based on the Comparator you define (i.e., where you compare edges based on the score you provide). Keep inserting edges blindly until you have inserted n of them. When you hit the capacity n, make a comparison (peek()) before you insert, you will find the comparable object always on top of the PQ so the cost of comparing is a good O(1). If the score of the new object is greater, perform consecutive poll() and add() operations; otherwise continue until all edges are compared.

When done, your PQ will have the top n weighted edges. Learn about PQs here.

mohsenmadi
  • 2,277
  • 1
  • 23
  • 34
0

When it comes to sorting, it's always a question of how often do you intend to sort vs how often do you intend to access the sorted values. Adjaceny matrices can be very slow to perform lookups, so sorting and accessing your values maybe faster if you represent the pair-nodes and scores in a different data structure that's more suitable for sorting and fast access of values. I would try to think of a way to store the edge scores with a reference to an entry(ies) in your AdjacencyMatrix in some sort of Collection and look for an efficient sorting algorithm for that collection. Since you are working with a fairly sizable data set, PriorityQueue comes to mind for me, but there may be other algorithms that may better suit your needs, an example of using priority queues to sort large sets of data in constant time can be found here. Once you sort your collection, you can grab the top 'n' values from the collection and retrieve the references to those entries in your adjacency matrix which you can use to do your graphing or what have you.

Note: Adjaceny Matrices have a high memory cost for data storage in addition to slow look up time, so this possible solution may have other performance implications for you, ultimately it will depend on how your are going to be using your data.

Edit:

Ok, to address your comment, say your matrix is named A, and the pair-node you are inserting is A[ i ][ j ], then you can use Entry as the object you set as your key value. When you look at

PriorityQueue < Entry < K, V>>

What you will be inserting as 'K' (your key value) is another Entry object which can be thought of like

PriorityQueue < Entry < Entry < Integer,Integer>, V>>

so when you call add, you insert (new Entry(i,j), edgeScore) into the queue. Does this make sense?

Edit:

To address your second comment, as I stated below, an adjacency matrix in theory is a 2D (nxn) Boolean array. They aren't the most memory efficient, and can be very slow to access, but have some usefulness under the right circumstances. For the more detailed implementation details, you can check out this example of implementing an adjaceny matrix that shows a very basic implementation approach that should get you started. You can also try looking at this post to see what other people have recommended as alternatives and ways to improve performance in the implementation, but essentially i would think the best way to utilize the priority queue is to build the queue AS you are inserting the values into your nxn boolean matrix, which would avoid iterating over the matrix itself and give you the advantage of the sorting power of the priority queue, and if you add more nodes after, you can also keep adding to the PriorityQueue and it will take care of ordering them.

Community
  • 1
  • 1
adam5990
  • 346
  • 2
  • 13
  • Thanks Adam. I confused . I don't know how to add pair node and its score to map .I say that give score to only nonadjacent pair-node. score of adjacent pair node = 0 . – sam May 25 '15 at 18:13
  • should i implement matrix using hash map? then use PriorityQueue . I'm new to java .please help me more – sam May 26 '15 at 13:27
  • Well now your getting into a whole different question. In a nutshell, an adjaceny matrix is an nXn matrix of boolean values, the constructor looks like this, boolean[][] adjacenyMatrix = new boolean[n], where n is the number of nodes, i'll make a final edit with a little more information, but for implementing adjaceny matrices, there are existing questions with detailed answers already on this forum. – adam5990 May 26 '15 at 13:41
  • how can i build queue as i inserting the values into my nxn double matrix – sam May 26 '15 at 14:13
  • Your getting too off topic in this thread, your original question has been answered no? If you need further help with an implementation problem, search this site for existing questions that apply to your problem and if none exist, post another question and ask a more specific technical question. Also please post more details in further questions, it is hard for people to give good answers without knowing what your doing and why. – adam5990 May 26 '15 at 14:41