1

The graph is unweighed, an element of the array of HashSets neighbours[] is a node neighbours[1] is node 1 (they start from 0 mind you) with its unique neighbouring nodes say 2 3 4 5. (so neighbours[5] will contain 1). And I have the following method I did with great deal of help as I dont get the algo much beyond theory. The number it returns should be the average distance between 2 nodes in the graph.

Imagine I have the following graph (node: in_links | out_links; neighbours[] does not contain the 0 loops at node 0, and no duplicates as I said.)

0: 0 0 0 | 0 0 0 1 1 1 2 3 5 6 7 7 8 8 9 9 11 
1: 0 0 0 | 2 2 3 4 4 5 6 8 
2: 0 1 1 | 3 
3: 0 1 2 | 4 9 
4: 1 1 3 | 5 12 
5: 0 1 4 | 6 7 10 
6: 0 1 5 | 10 11 12 
7: 0 0 5 | 
8: 0 0 1 | 10 
9: 0 0 3 | 12 
10: 5 6 8 | 11 
11: 0 6 10 | 
12: 4 6 9 | 

And for this trivial graph the distance that's returned is 5.781686749230769E8 ?!?! the code:

    public double getAvgDistance() {
    double total = 0;
    int[] dist = new int[n];
    ArrayList<Integer> Q = new ArrayList<Integer>();
    int tmp, index = 0, w = 0;

    for (int u=0; u<n; u++) {
        System.out.print("Avg Dist at "+u+"\r");
        // Initialise Q and dist for this iteration
        for (int v=u+1; v<n; v++) {
            Q.add(v);

            if (neighbours[u].contains(v)) {
                dist[v] = 1;
            } else {
                dist[v] = Integer.MAX_VALUE;
            }
        }

        while (!Q.isEmpty()) {

            tmp = dist[0];
            for (int e=1; e<Q.size(); e++) {
                if (dist[e] < tmp) {
                    w = Q.get(e);
                    tmp = dist[w]; // smallest dist is for this element w so far
                    index = e;
                }
            }
            Q.remove(index);

            for (int z : neighbours[w]) {
                if ( Q.contains(z)
                        && (dist[w]+1 < dist[z]) ) {

                    dist[z] = dist[w]+1;
                }
            }

        } // while end

        for (int v = u+1; v < n; v++ ) {
            total += dist[v];
        }

    } // for 0-n end

    return total /= (double)(n*(n-1)/2);
}

I don't have much experience with casting or printing real numbers so I hope its something to do with those! All comments welcome

Recct
  • 913
  • 1
  • 16
  • 40
  • So what is the problem, is 5.781686749230769E8 wrong? – JasonStoltz Jul 09 '10 at 16:18
  • how can possibly 0.00000005758... (if thats what it means anyway) be a valid average distance, look at the thing almost all nodes are connected to 0 so they wont need more than 2-3 hops. – Recct Jul 09 '10 at 16:26
  • @Recz Gotcha ... i attempted to answer thinking you were just concerned about the formatting of your returned value, but it sounds like the value itself is just way off from what it should be. – JasonStoltz Jul 09 '10 at 16:34
  • 1
    Is this supposed to be a directed or undirected graph? Also, you say that there are no loops and no duplicates. If so, why does your graph definition show such loops and duplicates? Are you filtering those out somewhere? – Karmastan Jul 09 '10 at 16:48
  • it is directed but yes unclear, by no dupes and loops i meant there are no loops from 0 to 0 (its just how the graph is instantiated) nor from node t to t (for this model of generation anyway). No duplicates means that there are no duplicate values in any of neighbours[0..n], as they are HashSets. When direction is discarded like in here yes there will be loops i think. – Recct Jul 09 '10 at 17:05
  • Haven't read the whole thing yet, but 5.781686749230769E8 does not equal 0.00000005758.... E8 is shorthand for *10^8, so it would actually be equal to 578168674.923.... (And yes, I realize that doesn't solve the OP's problem, but he did say "if thats what it means anyway.") – Pops Jul 09 '10 at 17:12
  • Ah didnt see your comment before, this explains a LOT. See how im using MAX_VALUE instead of Infinity as in the algo specification? Could it be that then, problem with initial distance setting or something. Duh! and ..E-8 would mean what I said, right? – Recct Jul 09 '10 at 17:35
  • @Recz: Please update your question instead of adding information in comments. – MAK Jul 11 '10 at 15:36

3 Answers3

1

If I understand your question correctly, nodes 7, 11 and 12 have no out links and therefore no valid paths to the other nodes.

Does your algorithm force a path by inserting a link with a cost of Integer.MAX_VALUE in these cases? If so, that would explain why you have such a very high average cost.

I also wondered whether it would be better to evaluate both forward and reverse paths. In a directed graph the cost of path AB is not necessarily the same as the cost of path BA. With your current algorithm, the cost of every path ending at node 12 is calculated, but no paths starting at node 12 are evaluated.

richj
  • 7,499
  • 3
  • 32
  • 50
0

I'm not sure if I totally understand your question, but it sounds like you MAY be having trouble with the value you are printing not being exactly what you expect. I suspect the problem may be when you are printing the value of the double. Whenever you convert a double directly to a String, I know you can get unexpected results.

This post suggests using a BigDecimal instead of a double to maintain precision: Retain precision with double in Java

So perhaps try doing something like the following and see if you have better results.

BigDecimal.valueOf(<your double value here>).toPlainString();
Community
  • 1
  • 1
JasonStoltz
  • 3,040
  • 4
  • 27
  • 37
  • Ah excellent, this makes it at least easier to read, result for a graph of same parameters (d=3 n=13) is 412977625.2307692. Eventually I may have to write these results to file, is it better to write them as plainStrings like this thing? (later they will be read in from file and added together and an average calculated) – Recct Jul 09 '10 at 17:11
0

I'm only guessing, but I see distances occasionally being set to Integer.MAX_VALUE. If those numbers are actually entering the result and then being divided down by one or two factors of 10, that would explain very nicely why the average is much, much larger than expected and roughly in the same ballpark as MAX_VALUE.

It's OK to have this large value in your graph when it is used to determine the shortest path among alternatives, but once you get to the point where you are determining actual distances, that number has to go!

Either you have a path length in the ballpark of MAX_VALUE, that says there is no path. That path length therefore doesn't go into your average. Or your path length is a small integer in the same magnitude as your in-graph distances, then it's valid and you can include it in your calculations.

The lesson to be taken home from this: Just because a number came out of a computer program that doesn't mean it's trustworthy or correct!

Carl Smotricz
  • 66,391
  • 18
  • 125
  • 167
  • yes the implementation is quite wrong, when it gets fixed with help irl i'll post the question but yes those go to the total, because im not properly updating distances (in the end i have either a distance of 1 or max_value). – Recct Jul 12 '10 at 15:08
  • Sounds like the kind of situation where I would proceed by fixing all the known bugs and then looking at what's left that's not working right. – Carl Smotricz Jul 12 '10 at 20:10