3

Problem background

I am currently developing a framework of Ant Colony System algorithms. I thought I'd start out by trying them on the first problem they were applied to: Travelling Salesman Problem (TSP). I will be using C# for the task.

All TSP instances will consist of a complete undirected graph with 2 different weights associated with each edge.

Question

Until now I've only used adjacency-list representations but I've read that they are recommended only for sparse graphs. As I am not the most knowledgeable of persons when it comes to data structures I was wondering what would be the most efficient way to implement an undirected complete graph?

I can provide additional details if required.

Thank you for your time.

UPDATE

Weight clarification. Each edge will have the two values associated with them:

  1. distance between two cities ( d(i,j) = d(j,i) same distance in both directions)
  2. amount of pheromone deposited by ants on that particular edge

Operations. Small summary of the operations I will be doing on the graph:

  • for each node, the ant on that particular node will have to iterate through the values associated with all incident edges

Problem clarification

Ant Colony Optimization algorithms can "solve" TSP as this is where they were first applied to . I say "solve" because they are part of a family of algorithms called metaheuristics optimizations, thus they never guarantee to return the optimal solution.

Regarding the problem at hand:

  • ants will know how to complete a tour because each ant will have a memory.
  • each time an ant visits a city it will store that city in its memory.
  • each time an ant considers visiting a new city it will search in its memory and pick an outgoing edge only if that edge will not lead it to an already visited city.
  • when there are no more edges the ant can choose it has complete a tour; at this point we can retrace the tour created by the ant by backtracking through its memory.

Research article details: Ant Colony System article

Efficiency considerations

I am more worried about run time (speed) than memory.

Morat
  • 503
  • 1
  • 5
  • 13
  • 4
    There is no single "most efficient" representation. The efficiency depends a lot of the list of operations you are going to provide and how often are they going to be called. – Vlad Jun 08 '12 at 15:19
  • If you have two weights associated with the edges, then you have a directed graph, not undirected (assuming the weights are for the different directions; otherwise it is really just one (albeit complex) weight) – Attila Jun 08 '12 at 15:21
  • Apologize if the terminology was incorrect: one value represents the distances between two cities, second value represents the quantity of pheromones deposited by an ant on the particular edge. This is still undirected. – Morat Jun 08 '12 at 15:25
  • 1
    @Morat: better to edit your question with your clarification than to make a comment. A lot of people think that most comments on SO are a waste of time so don't read them. They're mostly right, though this comment is a rare pearl. – High Performance Mark Jun 08 '12 at 15:26
  • Your question seems to contradict itself to some extent. Ant Colony algorithms usually refer to a situation where the actor doesn't really know where it's going until it gets there. There's not really a way to make this algorithmically efficient because you have no guarantees of your destination. The problem presented is covering the most area in the lowest amount of effort. Traveling Salesman problems usually refer to a situation where you have a system of destinations and problem is finding the route. Could you explain EXACTLY what problem you're trying to solve? – tmesser Jun 08 '12 at 15:28
  • When you say you care about efficiency, are you worried more about runtime speed or memory? Is the time spend building/updating the graph important, or just the time spent searching it? Is your graph sparsely populated always/sometimes/never? – Servy Jun 08 '12 at 15:38
  • I have mentioned that I am always dealing with complete graphs. – Morat Jun 08 '12 at 15:42
  • Depending on the number of ant hills being visited you may be in a realm that is very sensitive to memory caching. The size of the various processor caches and the precise layout of the data may have a significant effect on performance. The effects may be magnified by executing multiple threads. Benchmarking a few different models is the only way to tell and the results only apply to that configuration of hardware and software. – HABO Jun 08 '12 at 18:53

2 Answers2

2

First, there's a general guide to adjacency lists vs matrices here. That's a pretty low-level, non-specific discussion, though, so it might not tell you anything you don't already know.

The takeaway, I think, is this: If you often find yourself needing to answer the question, "I need to know the distance or the pheromone level of the edge between exactly node i and node j," then you probably want the matrix form, as that question can be answered in O(1) time.

You do mention needing to iterate over the edges adjacent to a node-- here is where some cleverness and subtlety may come in. If you don't care about the order of the iteration, then you don't care about the data structure. If you care deeply about the order, and you know the order up front, and it never changes, you can probably code this directly into an adjacency list. If you find yourself always wanting, e.g., the largest or smallest concentration of pheromones, you may want to try something even more structured, like a priority queue. It really depends on what kind of operations you're doing.

Finally, I know you mention that you're more interested in speed than memory, but it's not clear to me how many graph representations you'll be using. If only one, then you truly don't care. But, if each ant is building up its own representation of the graph as it goes along, you might care more than you think, and the adjacency list will let you carry around incomplete graph representations; the flip side of that is that it will take time to build the representations up when the ant is exploring on its frontier.

Finally finally, I know you say you're dealing with complete graphs and TSP, but it is worth thinking about whether you will ever need to adapt these routines to some other problem on possibly graphs, and if so, what then.

I lean toward adjacency lists and/or even more structure, but I don't think you will find a clean, crisp answer.

Community
  • 1
  • 1
Novak
  • 4,687
  • 2
  • 26
  • 64
1

Since you have a complete graph I would think that the best representation would be a 2D array.

public class Edge
{
//change types as appropriate
  public int Distance {get;set;}
  public int Pheromone {get;set;}
}


int numNodes;
Edge[,] graph = new Edge[numNodes,numNodes];
for(int i = 0; i < numNodes; i++)
{
  for(int j = 0; j < numNodes; j++)
  {
    graph[i][j] = new Edge();
    //initialize Edge
  }
}

If you have a LOT of nodes, and don't "remember" nodes by index in this graph, then it may be beneficial to have a Dictionary that maps a Node to the index in the graph. It may also be helpful to have the reverse lookup (a List would be the appropriate data structure here. This would give you the ability to get a Node object (if you have a lot of information to store about each node) based on the index of that node in the graph.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • I've thought about this as well but creating an undirected graph this way would make everything under the main diagonal redundant, wouldn't it (though pretty easy and fast to retrieve information)? Also 2 questions: 1. could you expand a bit more what you mean by reverse lookup list? 2. For what size would you consider a matrix like this to be too big (my matrix will always be n X n sized)? – Morat Jun 08 '12 at 17:54
  • @Morat Yes, it would. You have several options. You could make the array jagged, so you don't store that info. You could just deal with the redundant data, or you could use a 2D array but leave the redundant array values null. While redundantly storing the info uses 2x memory, it is a bit more convenient as you don't need to ensure the smallest/largest (depending on which half of the array you fill) index is searched first. – Servy Jun 08 '12 at 17:57