0

I am trying to develop a network resource manager component in C which keeps track of various network elements over TCP/UDP sockets. For this, I use three values :

  1. Hardware Location Number
  2. Service Group Number
  3. Node Number

The rule is that no two elements on a network may have the same set of these three numbers. Thus, each location's identity will be unique on the network. This information needs to be saved in the program (non-persistently) in a way so that given any of the parameters (could be just a single number, or a combination of any two, or all three) the program returns the eligible candidates by performing a quick search.

The addition and deletion should also be efficient, but given that there will be few insertions or deletions after the initial transient phase if they are a bit slower than search, it should be OK. Using trees is one option, but the answer of 'Which one to use?' still eludes me (Not that I know of many, but I look forward to learning newer ones if they serve my purpose).

To do this, I could have three different trees maintained separately with similar nodes pointing to a same structure in memory, but I feel that is inefficient and not compact. I am looking for a unified data set which can handle these variations like multiple keys.

Or I could have a single AVL tree with multiple keys (if that is allowed).

The number of elements in the network is dynamic, so using a 3D array is out of option.

A friend also suggested hashing, but I am not too sure.

Please help.

Anshul
  • 746
  • 9
  • 22

1 Answers1

1

Hashing seems like a silly choice for this. Perhaps the most significant reason is that you seem interested in approximate lookups. Hashing your values will likely mean iterating through the entire collection to find a group of nodes that have a common prefix, or a similar prefix.

PATRICIA is commonly used in routing tables, and makes itself quite amenable to searching for items that have similar keys. Note that I have found much misleading information about PATRICIA tries, which I've written about here. I found this resource to be particularly helpful.

Similarly to an AVL tree, you'll need to combine the three keys to form one (without hashing, preferably).

unsigned int key[3] = { hardware_location_number, service_group_number, node_number };
           /* ^------- Use something like this as your key */
Community
  • 1
  • 1
autistic
  • 1
  • 3
  • 35
  • 80
  • If we were to look at it from a distributed systems point of view, like a p2p network, DHTs are extensively used there, so what makes hashing more attractive there but not in this case? I am not much versed in Databases and their design, hence the curiosity. :-) – Anshul May 01 '13 at 17:19
  • 1
    @Anshul How would you look up approximate matches in a PATRICIA trie? The same way you retrieve values, except that you'd also test edit distance on the way down the tree. DHTs only support exact-match lookups! This is because they're hashtables. How would you look up approximate matches in a DHT? You'd have to iterate through every hash in the table, testing the edit distance against the stored key along the way, or construct hashes of the possible approximate matches and look for one of those in the DHT. Both of these are rather *brute-forcish*, and don't have very nice worst case scenarios. – autistic May 02 '13 at 02:41
  • Patricia trees can handle approximate lookups when the ordering of the key is hierarchical. Lets say, I fix my hierarchy as HardwareNumber.ServiceGroup.NodeNumber and Patricia would work just fine if I asked it to find all services on Hardware Number, say, 4. But would it work if I said search for all the nodes running on hardware number 4 irrespective of what service number they belong to? I mean, it can be made to work, but will that be efficient? – Anshul May 03 '13 at 06:52