Longest Prefix Lookup from Disk

Question

I currently have some implementation which performs Longest Prefix Lookup (LPM) from a radix tree data structure. This data structure contains IP prefixes (radix tree with max depth of 128 bits), and it is currently fully maintained in memory. The data is a read-only data and is never modified.

Unfortunately the data got bigger and I am no longer able to maintain it in memory. I am looking for an efficient data structure which can be held on disk and will provide efficient lookups. I am planing to use some cacheing mechanism on top of it.

What does this have to do with databases? The tag does not seem appropriate. — Gordon Linoff, Feb 18 '18 at 12:28
`I have a large file which contains ip prefixes, each mapped to some label.` **How** large? A few million? `mapped to some label` What is the size of this *payload* ? Just an int? a string? — wildplasser, Feb 21 '18 at 12:03
@wildplasser it's 20M read-only prefixes, lookup rate is 2M client IPs per hour. See OP comments under my answer below for more insights... — Andriy Berestovskyy, Feb 21 '18 at 14:42
20M* 16 bytes is only 320 MB Double that when using ASCII representation, Double that for pointer+payload-overhead, and you stay well under 2GB. For simplicity, I'd use a hash-table. You can certainly afford 128*500 hash-lookups per second. — wildplasser, Feb 21 '18 at 15:37
@wildplasser sure, that was my first suggestion. But for a reason OP insists on external search... After few requests, the memory constrains are still a mystery ;) — Andriy Berestovskyy, Feb 21 '18 at 15:45
@AndriyBerestovskyy Yes. FYI: I have an implementation of an mmapped textfile with an mmapped hash *index* on it. Could *easily* be adapted to variable-sized keys and variable-sized records. (Disk offsets will probably stay < 4GB in this application) — wildplasser, Feb 21 '18 at 15:50

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

efficient data structure which can be held on disk and will provide efficient lookups

Looks like an XY Problem to me:

We are talking about IPv6 (128 bits).
We are talking about IPv6 routing (Longest Prefix Match).
For 10 Gbit/s link there might be up to 14,8 million route lookups per second.
According to public data, there are around 50K IPv6 prefixes at the moment.

If those assumes above are correct, we need to store 50K times 128 bit = 800KB in a radix tree. It should fit into the memory easily.

We need to lookup this data millions times per second. Placing the data onto a disk is an order of magnitude slowdown.

My guess is that your question is not your root issue, so please:

Include information about a broader picture along with any attempted solution.
If there are other solutions you've already ruled out, share why you've ruled them out. This gives more information about your requirements.

UPDATE:

So it is 2M IPv6 lookups per hour over a read-only 20M set of IPv6 prefixes. We still don't know what are those 20M prefixes, since from the practical point of view there are no that many prefixes in whole Internet. But anyway.

2M lookup/hour makes ~550 lookups/sec. According to Wikipedia, traditional (mechanical) hard drives cannot sustain such a load. So we need an SSD drive to store the database.
According to Wikipedia, B-trees are good structures for external search.
According to Wikipedia, Memory-mapped files use small amounts of RAM even for a very large file.

Putting all bits together. Those 20M IPv6 prefixes should be organized as a B-tree with node size PAGE_SZIE (usually 4KB) and mapped into the virtual address space of the process. The cache will be managed by operating system, so no other caching mechanism is needed.

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 18 '18 at 17:31

Andriy Berestovskyy

8,059
3
17
33

My problem is not a routing problem. I have a large file which contains ip prefixes, each mapped to some label. For each client ip I need to perform LPM in order to get the correct label. The current solution loads the entire file to memory, but this is no longer possible. My plan is to maintain some cache in memory for the last queries, and read the data from disk. A naive implementation of Radix tree on disk may result in 128 read operations which is a lot. I am looking for a solution which can reduce the amount of reads. – michael Feb 19 '18 at 12:53
@michael overal,l "efficient LPM lookup over a data structure on disk" sounds illogical. I guess you have some numbers, like number of rules to match, number of client's IPs, memory restrictions etc. Unless you provide those numbers to backup your reasoning, the question has not much sense. I would go with optimising in-memory structures so they fit into memory instead of doing an "efficient" external search... – Andriy Berestovskyy Feb 20 '18 at 09:37
Why not? I would like to store the data on disk, why do you need a reason? I am currently unable to store it memory, and even if I do, the data keeps growing (in sw release). We currently handle about 20M cidr entries, and it is totally unnecessary for me to hold all of them in memory. – michael Feb 20 '18 at 14:51
@michael how many client IPs you match over those 20M entries and how often? – Andriy Berestovskyy Feb 20 '18 at 16:43
On very high load it can be 2M a hour. – michael Feb 21 '18 at 08:50
How do you perform LPM on a B-tree? – michael Feb 21 '18 at 21:26
@michael : treat it like a trie or Patricia -tree. Not very different from your radix-tree, I suppose. – wildplasser Feb 21 '18 at 21:40
@michael we can try to divide prefix bits into few groups and expand prefixes to those group boundaries, effectively avoiding LPM at lookup time. Not sure about those 20M prefixes, but for Internet routes most common prefix lengths are /24, /32 and /48, so those might be a good starting point for the boundaries. There are might be ready to use solutions for your needs, but it is uncommon to have 20M prefixes in a routing table and do an external LPM. So it might take you a few tries to find an efficient solution... – Andriy Berestovskyy Feb 22 '18 at 01:01

Longest Prefix Lookup from Disk

1 Answers1

UPDATE: