2

I am writing a program in C++ that requires IP addresses( all IPv4) to be looked up and stored in a fast way. Every IP address has a data associated with it. In case it already exists in the trie, I intend to merge the data of the IP address in the trie with the new addresses data. If it is not present, I intend to add it as a new entry to the trie. Deletion of IP address is not necessary.

In order to implement this, I need to design a Patricia Trie. However, I am unable to visualize the design beyond this. It seems quite naive of me, but the only idea that came to my mind was to change the IP address to their binary form and then use the trie. I am however clueless about HOW exactly to implement this.

I would be really thankful to you if you could help me with this one. Please note that I did find a similar question here . The question or more specifically the answer was beyond my understanding as the code in the CPAN web site was not clear enough for me.

Also note, my data is the following format

10.10.100.1: "Tom","Jack","Smith"

192.168.12.12: "Jones","Liz"

12.124.2.1: "Jimmy","George"

10.10.100.1: "Mike","Harry","Jennifer"

Community
  • 1
  • 1
hytriutucx
  • 1,614
  • 6
  • 26
  • 37

2 Answers2

5

I think you are referring to a RadixTree. I have an implementation of a RadixTrie in Java, if you want to use that as a starting point, which does the actual key to value mapping. It uses a PatriciaTrie as it's backing structure.

Example using the following strings.

  1. 10.10.101.2
  2. 10.10.100.1
  3. 10.10.110.3

Trie example (uncompressed)

└── 1
    └── 0
        └── .
            └── 1
                └── 0
                    └── .
                        └── 1
                            ├── 0
                            │   ├── 1
                            │   │   └── .
                            │   │       └── (2) 10.10.101.2
                            │   └── 0
                            │       └── .
                            │           └── (1) 10.10.100.1
                            └── 1
                                └── 0
                                    └── .
                                        └── (3) 10.10.110.3

Patricia Trie (compressed)

└── [black] 10.10.1
    ├── [black] 0
    │   ├── [white] (0.1) 00.1
    │   └── [white] (1.2) 01.2
    └── [white] (10.3) 10.10.110.3
Justin
  • 4,196
  • 4
  • 24
  • 48
  • I'm having trouble understanding your explanation of PATRICIA tries. The references I have read (Knuth, Sedgewick and [this](https://web.archive.org/web/20140129124956/http://www.mcdowella.demon.co.uk/Patricia.html)) all seem to suggest "Patricia creates a binary tree, with one node per string stored", but it seems to me as though your PATRICIA creates more than one node per *string* stored... – autistic Dec 08 '15 at 01:38
  • @Seb I'd assume they refer to the compressed version of the Patricia Trie (as seen in my second example above). That being said, Patricia Trie seems to be a bit overloaded in it's definition. The version I wrote follows along with the explanation here: https://youtu.be/jXAHLqQthKw?t=244 – Justin Dec 08 '15 at 03:40
  • Make no mistake, I have read Knuth and Sedgewick's books on this subject and I understand their explanation precisely. I also have a copy of "PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric", which is historically speaking **the authoritive definition** of PATRICIA, written by D. R. Morrison and submitted to the Journal of ACM. Your diagram is using five nodes to store three strings; according to Morrison, "Corollaries state that as the library grows, each new end brings into the library with it **exactly one new branch**"... The emphasis is mine. – autistic Dec 08 '15 at 05:38
  • If you would like to know what a PATRICIA tree looks like, it's [much more like a cyclic graph](http://flylib.com/books/3/55/1/html/2/images/15fig11.gif). Your diagram looks like some other kind of radix trie. – autistic Dec 08 '15 at 05:39
  • @Seb I think you are arguing implementation. It can be implement using a binary radix tree, DAG, or many other ways. If you look at the slides from Sedwick himself while at Princeton, his graph looks exactly like a my example above. http://algs4.cs.princeton.edu/lectures/52Tries.pdf Page 51 He also mentions that Patricia tree is 'Also known as: crit-bit tree, radix tree.' – Justin Dec 08 '15 at 16:20
  • That's most likely Kevin Wayne's influence. The image I linked you to earlier is a Sedgewick diagram which you can find in [page 653 of his book](https://books.google.com.au/books?id=hyvdUQUmf2UC&pg=PA653&lpg=PA653#v=onepage&q&f=false). Regardless, it is the inventor of the toaster who named the toaster; people can call an oven a toaster if they like, but at the end of the day, historically, they are wrong. PATRICIA has very specific algorithms defined by the conceptual author, and those algorithms produce a cyclic graph with only one type of node: "value" nodes, and no "internal" nodes. – autistic Dec 09 '15 at 02:40
  • p514-morrison.pdf is the name of the document outlining the concept known as PATRICIA. It contains the following quotes: "The amount of computation required to find one occurrence of a key, if there is one, or to find that there is none, has a bound which depends linearly on the length of the key and is independent of the size of the library"... This doesn't work if you require nullary nodes for the purpose of branching. – autistic Dec 09 '15 at 02:50
  • "The computational effort involved in adding a new item to the library and adjusting the index to reflect its presence consists of copying the new item into the text, determining how much of it is already present, as outlined above, adding five numbers, and changing one in the previously existing index." As you can see, it's very detailed. If your algorithm does more than this, then your algorithm is not PATRICIA. – autistic Dec 09 '15 at 02:59
1

Patricia tries solve the problem of finding the best covering prefix for a given IP address (they are used by routers to quickly determine that 192.168.0.0/16 is the best choice for 192.168.14.63, for example). If you are just trying to match IP addresses exactly, a hash table is a better choice.

chepner
  • 497,756
  • 71
  • 530
  • 681