1

I'm working with a very large undirected graph (email networks for a company).

I’m bit confused about selecting the best and suitable technique of undirected graph of email networks. In that network, the vertices represent email addresses and an edge represents that there was at least one email in one direction between the two addresses.

Is there someone who knows the best technique to represent the algorithm?

I am using a large undirected graph of mailing, then which representation is good? Adjacency list or Adjacency Matrix?

Mika Sundland
  • 18,120
  • 16
  • 38
  • 50
Jennie
  • 11
  • 1
  • 1
    The answer depends on the quantity of nodes (number of employees) and the queries that you want to send to the data-structure, also the distribution of connections count would be pretty useful (i.e. 50% < 50, 90% < 100). Unless you provide those you'll receive only generic answers. – user3707125 Jan 22 '18 at 01:51

2 Answers2

1

It depends on the number of people emailing each other and on the operations done on the graph.

If there is a high chance that 2 people have emailed each other then you should go with adjacency matrix.

On the other hand if the number of edges (2 people who emailed each other at least one) is small compared to the number of email addresses you should go with adjacency list.

Another thing to look at is what types of operations are you doing on the graph.

So, if the majority of the operations consist of querying if two nodes have an edge between them, then adjacency matrix would be the best choice.

On the other hand if the majority of the operations are traversing the graph or querying the list of nodes connected to a given node, then adjacency list would be better.

If you are doing a mix of both types of queries, you could represent the graph as an array of hash tables. So, it would be an adjacency list representation using hash tables instead of lists.

Update

Please check the answers to this question. They explain in detail the differences between the adjacency list and adjacency matrix.

In order to find out the number of edges

I would run a program to calculate the number of edges. It would look like the following:

mp = hash_table
for email in emails
   if !mp[email.sender][email.receiver]
       mp.insert({email.sender, email.receiver})
   end
end
return mp.size

If the program crashed, then you might have exceeded the memory and the number of edges is big compared to the number of email addresses (since the number of email addresses is in the millions [as mentioned in the comments]) and you might wanna go with adjacency list.

If you really wanna find the exact number of edges you could segment the emails where each segment consists of emails with the same sender and run the above program on each segment, then the final answer would be around the summation of results

MrGreen
  • 479
  • 4
  • 24
  • Thank you :) Then I should use Adjacency list because there is a large number of email network? – Jennie Jan 22 '18 at 11:22
  • If the number of edges is small compared to them, yes. I liked a question that might explain more. Please check the update – MrGreen Jan 22 '18 at 13:37
  • There is a large network and in every 2 nodes there is at least one edge between them So , How can we guess that how many edges are there in nodes? I dont know the exact figure of email network. – Jennie Jan 22 '18 at 14:26
  • @Jennie what is the number of email addresses. Is it in the millions?, billions? – MrGreen Jan 22 '18 at 15:28
  • Approximately Millions – Jennie Jan 22 '18 at 15:49
  • In a large graph the vertices are email addresses and an edge represents (that there was at least one email in at least one direction between the two addresses.) – Jennie Jan 22 '18 at 15:51
0

Whether you should use adjacency matrix or adjacency list depends on the density of the graph. A graph representing a email network of a company is usually sparse because only a small portion of the employees need to mail each other. For example, you won't email someone not in the same department with you. Thus, you may use adjacency list.