0

I am thinking about how to organize/allocate memory and that led me to this question, which I have distilled to its essence, so it may seem out of the blue, but the memory question is too complicated and confusing and distracting, so I am not asking it here. I tried asking it earlier and got no response. So here it goes.

Say you have a system where you want to check if you have some integer exists in it as fast as possible, and you want to be able to add and remove integers as fast as possible too. That is, it's a key-store essentially, not even a key-value-store. It only stores integer keys, and I guess theoretically the value would conceptually be a true boolean, but I don't think it needs to be there necessarily.

One solution is to use sparse arrays. For example, this array would return true in O(1) time for 3, 5, and 9.

[-, -, 3, -, 5, -, -, -, 9, -, -, ...]

You just specify the integer - 1 and get back if it exists at that position in the array. But the problem with this is twofold. First, it takes up a lot of wasted memory with all these null values. And secondly, for my project, I have the added constraint that arrays can only exist at sizes 1, 2, 4, 8, 16, and 32 items in length. So I am stuck using short arrays. Which means I'm left with trees of some sort most likely.

So these are the constraints:

  1. Positive 32-bit integers.
  2. Compact data structure (minimal wasted space).
  3. Fast lookup times.
  4. Fast deletion/insertion times.
  5. Array sizes in the data structure can only be sized at 1, 2, 4, 8, 16, or 32 items.

One approach I am thinking that might work that fits these constraints is a sort of tree, like a B+tree. But I am not super familiar with these yet, still playing with them. I think it would be like a key-only B+tree (no values, and hence no leaves). But not entirely sure what this would look like (in JavaScript, if I had to pick a language).

Another approach I thought of is a binary trie. But this would take up to 32 steps per integer to add and check membership, which seems like we could potentially optimize somehow.

What is the main approach to storing sequential 32-bit integers in some data structure for rapidly checking membership, doing insertions, and doing removals? Perhaps this can learn from IP lookup tables or things of that nature.

I am using this for a custom programming language, though seeing an implementation in JavaScript would be helpful. If it's just the name of the data structure you're providing, please explain how it works or implement it in JavaScript without resorting to JavaScript built-ins which may already solve the problem.

Lance
  • 75,200
  • 93
  • 289
  • 503
  • Why not use a Set instance? It's basically exactly what you're asking for. – Pointy Jan 09 '21 at 17:20
  • JavaScript (well, modern JavaScript) has a native Set class. – Pointy Jan 09 '21 at 17:22
  • Updated the question. I would like to implement this or see how it's implemented without resorting to built-ins. Knowing it's the Set class doesn't get me closer to understanding implementation unfortunately. How is the Set class implemented in that case, and is it the most optimal? – Lance Jan 09 '21 at 17:22
  • 1
    Well I think you're correct that a B-Tree or some variant is probably the most space and time efficient way to do it. – Pointy Jan 09 '21 at 17:23
  • It's a b-tree, not a b+tree? – Lance Jan 09 '21 at 17:30
  • A B+ tree is a variant of a B tree. The "B" is short for "balanced" I think. It came from the database world but for this application it might be a good approach. *Might* be; it's not an application need that I've encountered. – Pointy Jan 09 '21 at 18:01
  • There's also "B' trees" ("B prime") – Pointy Jan 09 '21 at 18:35
  • Do you have a link to b prime trees? I am thinking these [radix trees](https://stackoverflow.com/a/65646217/169992) might be a better fit than a b-tree. – Lance Jan 09 '21 at 18:54
  • Honestly I haven't looked at that kind of stuff in a very large number of years. – Pointy Jan 09 '21 at 19:38
  • https://en.wikipedia.org/wiki/Succinct_data_structure – Matt Timmermans Jan 09 '21 at 19:57
  • Am I correct that you can use arrays of up to 32 items, and each member of an item can itself be an array or integer? Also what do you want to have happen if an integer is inserted twice then deleted once? (ie Is the actual requirement a set or multiset?) – btilly Jan 09 '21 at 20:20
  • @btilly each integer can only be inserted once, so I think it's just a set. – Lance Jan 09 '21 at 20:21

2 Answers2

1

Here is my suggested approach.

Every node should be a 32 entry array. If the first entry is null or an array, it is a 32-way split of the whole search space. Otherwise it is descending list of the entries in this block. Fill in the non-entries with -1.

This means that in no more than 5 lookups we get to the block that either has or doesn't have our value. We then do a linear scan. A binary search of a sorted list naively seems like it would be more efficient, but in fact it involves a series of hard to predict branches which CPUs hate. In a low level language (which I hope yours is), it is faster to avoid the pipeline stall and do a linear search.

Here is an implementation in JavaScript.

class IntegerSet {
    constructor() {
        this.make_node = function () {
            let node = new Array();
            for (let i = 0; i < 32; i++) {
                node[i] = -1;
            }
            return node;
        };
        this.root = this.make_node();

        this.lookup = function (target) {
            let node = this.root;
            let size = 2**32;
            while (Array.isArray(node[0])) {
                size /= 32;
                let idx = Math.floor(target / size);
                node = node[idx];
                target = target % size;
            }

            for (let i = 0; i < 32; i++) {
                if (target == node[i]) {
                    return true;
                }
                // On average this lets us break out 1/4 of the way through.
                else if (node[i] < target) {
                    return false;
                }
            }
            return false;
        };

        this.insert = function (target, node) {
            node = node || this.root;
            let size = 2**32;
            while (Array.isArray(node[0])) {
                size /= 32;
                let idx = Math.floor(target / size);
                node = node[idx];
                target = target % size;
            }

            for (let i = 0; i < 32; i++) {
                if (target == node[i]) {
                    return false;
                }
                else if (node[i] < target) {
                    // copy elements along, in order.
                    let value = node[i];
                    for (let j = i; j < 31; j++) {
                        node[j] = target;
                        target = value;
                        value = node[j+1];
                        if (target == -1) {
                            break;
                        }
                    }
                    node[31] = target; // just in case the block is full
                    if (-1 == target) {
                        return true;
                    }
                    else {
                        break; // we inserted and filled the node.
                    }
                }
            }

            // If we get here, we need to split the node.
            let node_copy = this.make_node();
            for (let i = 0; i < 32; i++) {
                node_copy[i] = node[i];
                node[i] = this.make_node();
            }
            size /= 32;
            let idx = Math.floor(target / size);
            node[idx][0] = target % size;
            for (let i = 0; i < 32; i++) {
                let value = node_copy[i];
                idx = Math.floor(value / size);
                this.insert(value % size, node[idx]);
            }
        };

        this.remove = function (target) {
            let node = this.root;
            let size = 2**32;

            while (Array.isArray(node[0])) {
                size /= 32;
                let idx = Math.floor(target / size);
                node = node[idx];
                target = target % size;
            }

            for (let i = 0; i < 32; i++) {
                if (target == node[i]) {
                    for (let j = i; j < 31; j++) {
                        node[j] = node[j+1];
                        if (node[j] == -1) {
                            break;
                        }
                    }
                    node[31] = -1;
                    return true;
                }
            }
            return false;
        };
    }
}

let foo = new IntegerSet();
for (let i = 0; i < 300; i++) {
    foo.insert(Math.floor(Math.random() * 2**32));
}
console.log(foo.root);
console.log(foo.lookup(5));
console.log(foo.remove(5));
console.log(foo.insert(5));
console.log(foo.root);
console.log(foo.lookup(5));
console.log(foo.remove(5));

Here are the characteristics of this data structure.

A read requires no more than 5 lookups to get to a block, then a scan with at most 64 if statements (which are well-predicted to always be false). If you insert a lot of random elements, the leaves are on average half-full, and on average we only do 1/4 of the scan.

The fact that there are 32 children of each internal node mean that the internal nodes only add a few percent overhead to the structure.

If you insert and then delete, performance remains good but we never will reclaim memory.


The following is an implementation that is similar to the previous but uses arrays of different size, and allows you to reclaim space. It aims to make 60-70% of the space actually data. But this comes, obviously, with penalties on insert/remove.

class IntegerSet {
    constructor() {
        this.make_node = function (size) {
            let node = new Array();
            for (let i = 0; i < size; i++) {
                node[i] = -1;
            }
            node.child_size = 0;
            return node;
        };
        this.root = this.make_node(2);

        this.lookup = function (target) {
            let node = this.root;
            let size = 2**32;
            while (Array.isArray(node[0])) {
                size /= 32;
                let idx = Math.floor(target / size);
                node = node[idx];
                target = target % size;
            }

            let i = 0;
            while (target < node[i]) {
                i++;
            }

            return target == node[i];
        };

        this.insert = function (target, node, parent_node, size) {
            node = node || this.root;
            if (size == null) {
                size = 2**32;
            }
            let idx = null;
            let update_parent = false;
            while (Array.isArray(node[0])) {
                size /= 32;
                parent_node = node;
                update_parent = true;
                idx = Math.floor(target / size);
                node = node[idx];
                target = target % size;
            }

            let i = 0;
            while (target < node[i]) {
                i++;
            }

            if (target == node[i]) {
                return true; // already had it.
            }
            else {
                // copy along, in order.
                let value = node[i];
                while (-1 < target) {
                    value = node[i]
                    node[i] = target;
                    i++;
                    target = value;
                }
                if (update_parent) {
                    parent_node.child_size++;
                }

                if (i == node.length) {
                    let new_node = null;
                    if (node.length < 32) {
                        // Double node size.
                        new_node = this.make_node(node.length * 2);
                        for (let j = 0; j < i; j++) {
                            new_node[j] = node[j];
                        }
                    }
                    else {
                        // Need to split.
                        new_node = this.make_node(node.length);
                        for (let j = 0; j < node.length; j++) {
                            new_node[j] = this.make_node(2);
                        }
                        new_node.child_size = node.length;
                        size /= 32;
                        for (let j = 0; j < 32; j++) {
                            target = node[j];
                            let new_idx = Math.floor(target / size);
                            target = target % size;
                            this.insert(target, new_node[new_idx], parent_node, size);
                        }
                    }
                    if (parent_node != null) {
                        parent_node[idx] = new_node;
                    }
                    else {
                        this.root = new_node;
                    }
                }
            }
        };

        this.remove = function (target) {
            let node = this.root;
            let grandparent_node = null;
            let parent_node = null;
            let size = 2**32;
            let idx = null;
            let parent_idx = null;

            while (Array.isArray(node[0])) {
                size /= 32;
                parent_idx = idx;
                idx = Math.floor(target / size);
                grandparent_node = parent_node;
                parent_node = node;
                node = node[idx];
                target = target % size;
            }

            let i = 0;
            while (target < node[i]) {
                i++;
            }

            if (node[i] == target) {
                // now delete along.
                while (-1 < node[i]) {
                    node[i] = node[i+1];
                    i++;
                }

                if (parent_node != null) {
                    parent_node.child_size--;
                    // Reconsolidate if we save about 1/4 of memory.
                    if (parent_node.child_size < 24) {
                        let new_node = this.make_node(32);
                        let new_id = 0;
                        for (let i = 31; -1 < i; i--) {
                            let old_node = parent_node[i];
                            let j = 0;
                            while (-1 < old_node[j]) {
                                new_node[new_id] = size * i + old_node[j];
                                new_id++;
                                j++;
                            }
                        }
                        if (grandparent_node == null) {
                            this.root = new_node;
                        }
                        else {
                            grandparent_node[parent_idx] = new_node;
                            grandparent_node.child_size += parent_node.child_size - 33;
                        }
                        return true;
                    }
                }

                // Reconsolidate if we save about 2/3 of memory.
                if (i < node.length / 3 && 2 < node.length) {
                    let new_node = this.make_node(node.length/2);
                    for (let j = 0; j < i; j++) {
                        new_node[j] = node[j];
                    }
                    if (parent_node != null) {
                        parent_node[idx] = new_node;
                    }
                    else {
                        this.root = new_node;
                    }
                }

                return true;
            }
            else {
                return false;
            }
        };
    }
}

let foo = new IntegerSet();
let keep = new Array;
for (let i = 0; i < 5; i++) {
    keep.push(Math.floor(Math.random() * 2**32));
}
let remove = new Array;
for (let i = 0; i < 300; i++) {
    remove.push(Math.floor(Math.random() * 2**32));
}
for (let i = 0; i < keep.length; i++) {
    foo.insert(keep[i]);
}
for (let i = 0; i < remove.length; i++) {
    foo.insert(remove[i]);
}
for (let i = 0; i < remove.length; i++) {
    foo.remove(remove[i]);
}
console.log(foo.root);
for (let i = 0; i < keep.length; i++) {
    console.log(keep[i]);
    console.log(foo.lookup(keep[i]));
    console.log(foo.remove(keep[i]));
    console.log(foo.lookup(keep[i]));
    console.log(foo.remove(keep[i]));
}
btilly
  • 43,296
  • 3
  • 59
  • 88
  • Awesome! Is it possible to implement remove so it can reclaim memory? Also, as you posted this I just finished a simple [binary trie based approach](https://gist.github.com/lancejpollard/eaffffcf06f5e5ce17fd1ecc31e9f443), but I assume this will run into the pipelining problem you mention, correct? – Lance Jan 09 '21 at 23:20
  • Thank you for not using recursion either, and making your code readable :) Also, can you explain how the digging down works in `while (Array.isArray(node[0]))`? – Lance Jan 09 '21 at 23:50
  • @LancePollard Unless the optimizer is really clever, it will. Also having a 2-way split versus a 32-way split will cause it to slow down. I have ideas on making it able to reclaim memory. But that will impose a several percent memory penalty. However I've got a way to improve memory usage enough that it is OK. – btilly Jan 10 '21 at 00:06
  • Nice! That is okay if you have a memory penalty, although it sounds like you figured something out for that even. – Lance Jan 10 '21 at 00:17
  • As for the digging down, every node either is all arrays or all integers. If the first entry is an array, then we are still on an interior node and need to continue traversing the tree until we wind up at a leaf. – btilly Jan 10 '21 at 00:24
  • 1
    @LancePollard Adding reclaiming and memory saving was a little tricky. The idea is to make sure that there is always a -1 sentinel at the end of the array, and to keep track of sizes carefully. This simplified searching, and allowed variable size arrays to be used. The fact that I could just add a property to an array in JavaScript for the node_size helped me a lot, but might be a complication in your eventual target language. – btilly Jan 10 '21 at 04:16
  • Can you explain the reasoning behind the 2/3 and 1/4 reclaiming on remove, I don't understand why you chose those values. Why not just always reconsolidate as much as possible? – Lance Jan 10 '21 at 10:34
  • Doing this I get call stack size exceeded error: `let foo = new IntegerSet(); for (let i = 0; i < 1000; i++) { foo.insert(i * 2) }`. – Lance Jan 10 '21 at 10:42
  • @LancePollard The problem was endless recursion because it didn't know the size on recursive inserts into the same partition. I also had a remove bug. – btilly Jan 10 '21 at 19:42
  • 1
    The reasoning between 2/3 and 1/4 is that reclaiming is expensive. If you repeatedly insert/delete one element, we don't want to constantly be copying and recopying data. So you want a balance between how much time you spend doing that and how much memory you are saving. I chose those as reasonable numbers that I think are likely to work out well in the real world. But you can try different parameters and benchmarks to change the tradeoff. – btilly Jan 10 '21 at 19:44
1

You wrote:

One approach I am thinking that might work that fits these constraints is a sort of tree, like a B+tree. [...] I think it would be like a key-only B+tree (no values, and hence no leaves). But not entirely sure what this would look like (in JavaScript, if I had to pick a language).

Indeed, a key-only B(+)-tree would be a possible solution. It solves the problem of the wasted space you get from a sparse array solution. It is also a solution that works well when the data type happens to be string or float instead of number, which would be a problem for the sparse array solution.

The code I post below is similar to another B+ tree related answer I posted earlier on another question of yours, but which was not a search tree.

The idea is to use a B+ tree, taking into account the extra requirement about array sizes that are powers of 2. The actual data keys are found in the bottom level of the tree. The internal nodes repeat the smallest key that its subtree contains. This can be used for deciding which child to choose when walking from the root down the tree, in search for a bottom-level location where a key should be found or should be inserted.

The tree is kept compact by distributing children/keys to neighbors when a node gets full, only splitting a node when there is no room in the neighbors either. When a key is deleted, the affected node will merge with a neighbor when possible. Each node (except the root) is guaranteed to have at least half the use of the maximum (32) node capacity -- either for storing children or keys (at the bottom level).

Keys are located starting the search at the root. A child is selected using binary search so that the key of that child is the greatest key that is less or equal to the key were are locating. If the key is not found at the bottom level, we have at least the location where it should be, and it could be inserted there.

Implementation

class Node {
    constructor(capacity) {
        // Mimic fixed-size array (avoid accidentally growing it)
        this.children = Object.seal(Array(capacity).fill(null));
        this.childCount = 0; // Number of used slots in children array
        this.key = null; // Property for supporting a search
        // Maintain back-link to parent.
        this.parent = null;
        // Per level in the tree, maintain a doubly linked list
        this.prev = this.next = null;
    }
    setCapacity(capacity) {
        if (capacity < 1) return;
        // Here we make a new array, and copy the data into it
        let children = Object.seal(Array(capacity).fill(null));
        for (let i = 0; i < this.childCount; i++) children[i] = this.children[i];
        this.children = children;
    }
    isLeaf() {
        return !(this.children[0] instanceof Node);
    }
    index() {
        return this.parent.children.indexOf(this);
    }
    updateKey() {
        this.key = this.isLeaf() ? this.children[0] : this.children[0].key;
        for (let node = this.parent; node; node = node.parent) {
            node.key = node.children[0].key;
        }
    }
    wipe(start, end) {
        this.children.copyWithin(start, end, this.childCount);
        for (let i = this.childCount - end + start; i < this.childCount; i++) {
            this.children[i] = null;
        }
        this.childCount -= end - start;
        // Reduce allocated size if possible
        if (this.childCount * 2 <= this.children.length) this.setCapacity(this.children.length / 2);
        // Update key if first item changed
        if (start === 0) this.updateKey();
    }
    moveFrom(neighbor, target, start, count=1) {
        // Note: `start` can have two meanings:
        //   if neighbor is null, it is the key/Node to move to the target
        //   if neighbor is a Node, it is the index from where keys(s)/Node(s) have to be moved to the target
        // Make room in target node
        if (this.childCount + count > this.children.length) this.setCapacity(this.children.length * 2);
        this.children.copyWithin(target + count, target, Math.max(target + count, this.childCount));
        this.childCount += count;
        if (neighbor !== null) {
            // Copy the children
            for (let i = 0; i < count; i++) {
                this.children[target + i] = neighbor.children[start + i];
            }
            // Remove the original references
            neighbor.wipe(start, start + count);
        } else {
            this.children[target] = start; // start is key to insert
        }
        // Set parent link(s)
        if (!this.isLeaf()) {
            for (let i = 0; i < count; i++) {
                this.children[target + i].parent = this;
            }
        }
        // Update key if first item changed
        if (target === 0) this.updateKey();
    }
    moveToNext(count) {
        this.next.moveFrom(this, 0, this.childCount - count, count);
    }
    moveFromNext(count) {
        this.moveFrom(this.next, this.childCount, 0, count);
    }
    basicRemove(index) {
        if (!this.isLeaf()) {
            // Take node out of the level's linked list
            let prev = this.children[index].prev;
            let next = this.children[index].next;
            if (prev) prev.next = next;
            if (next) next.prev = prev;
        }
        this.wipe(index, index + 1);
    }
    basicInsert(index, value) { // value can be a key or a Node
        this.moveFrom(null, index, value);
        if (value instanceof Node) {
            // Insert node in the level's linked list
            if (index > 0) {
                value.prev = this.children[index-1];
                value.next = value.prev.next;
            } else if (this.childCount > 1) {
                value.next = this.children[1];
                value.prev = value.next.prev;
            }
            if (value.prev) value.prev.next = value;
            if (value.next) value.next.prev = value;
        }
    }
    pairWithSmallest() {            
        return this.prev && (!this.next || this.next.childCount > this.prev.childCount)
            ? [this.prev, this] : [this, this.next];
    }
    toString() {
        return "[" + this.children.map(v => v??"-").join() + "]";
    }
}

class Tree {
    constructor(nodeCapacity=32) {
        this.nodeCapacity = nodeCapacity;
        this.root = new Node(1);
        this.first = this.root; // Head of doubly linked list at bottom level
    }
    locate(key) {
        let node = this.root;
        while (!node.isLeaf()) {
            // Binary search among keys
            let low = 0;
            let high = node.childCount;
            while (low < high) {
                let index = (low + high) >> 1;
                if (key >= node.children[index].key) {
                    low = index + 1;
                } else {
                    high = index;
                }
            }
            if (low) low--;
            node = node.children[low];
        }
        // Binary search in leaf
        let low = 0;
        let high = node.childCount;
        while (low < high) {
            let index = (low + high) >> 1;
            if (key > node.children[index]) {
                low = index + 1;
            } else {
                high = index;
            }
        }
        return [node, low];
    }
    has(key) {
        let [node, index] = this.locate(key);
        return index < node.childCount && node.children[index] === key;
    }
    remove(key) {
        let [node, index] = this.locate(key);
        if (index >= node.childCount || node.children[index] !== key) return; // not found
        while (true) {
            node.basicRemove(index);

            // Exit when node's fill ratio is fine
            if (!node.parent || node.childCount * 2 > this.nodeCapacity) return;
            // Node has potentially too few children, we should either merge or redistribute
            
            let [left, right] = node.pairWithSmallest();
            
            if (!left || !right) { // A node with no siblings? Must become the root!
                this.root = node;
                node.parent = null;
                return;
            }
            let sumCount = left.childCount + right.childCount;
            let childCount = sumCount >> 1;
            
            // Check whether to merge or to redistribute
            if (sumCount > this.nodeCapacity) { // redistribute
                // Move some data from the bigger to the smaller node
                let shift = childCount - node.childCount;
                if (!shift) { // Boundary case: when a redistribution would bring no improvement
                    console.assert(node.childCount * 2 === this.nodeCapacity && sumCount === this.nodeCapacity + 1);
                    return;
                }
                if (node === left) { // move some children from right to left
                    left.moveFromNext(shift);
                } else { // move some children from left to right
                    left.moveToNext(shift);
                }
                return;
            }
            
            // Merge:
            // Move all data from the right to the left
            left.moveFromNext(right.childCount);
            // Prepare to delete right node
            node = right.parent;
            index = right.index();
        }
    }
    insert(value) {  // value is a key
        let [node, index] = this.locate(value);
        if (index < node.childCount && node[index] === value) return; // already present
        while (node.childCount === this.nodeCapacity) { // No room here
            if (index === 0 && node.prev && node.prev.childCount < this.nodeCapacity) {
                return node.prev.basicInsert(node.prev.childCount, value);
            }
            // Check whether we can redistribute (to avoid a split)
            if (node !== this.root) {
                let [left, right] = node.pairWithSmallest();
                let joinedIndex = left === node ? index : left.childCount + index;
                let sumCount = left.childCount + right.childCount + 1;
                if (sumCount <= 2 * this.nodeCapacity) { // redistribute
                    let childCount = sumCount >> 1;
                    if (node === right) { // redistribute to the left
                        let insertInLeft = joinedIndex < childCount;
                        left.moveFromNext(childCount - left.childCount - +insertInLeft);
                    } else { // redistribute to the right
                        let insertInRight = index >= sumCount - childCount;
                        left.moveToNext(childCount - right.childCount - +insertInRight);
                    }
                    if (joinedIndex > left.childCount || 
                            joinedIndex === left.childCount && left.childCount > right.childCount) {
                        right.basicInsert(joinedIndex - left.childCount, value);
                    } else {
                        left.basicInsert(joinedIndex, value);
                    }
                    return;
                }
            }
            // Cannot redistribute: split node
            let childCount = node.childCount >> 1;
            // Create a new node that will later become the right sibling of this node
            let sibling = new Node(childCount);
            // Move half of node node's data to it
            sibling.moveFrom(node, 0, childCount, childCount);
            // Insert the value in either the current node or the new one
            if (index > node.childCount) {
                sibling.basicInsert(index - node.childCount, value);
            } else {
                node.basicInsert(index, value);
            }
            // Is this the root? 
            if (!node.parent) {
                // ...then first create a parent, which is the new root
                this.root = new Node(2);
                this.root.basicInsert(0, node);
            }
            // Prepare for inserting the sibling node into the tree
            index = node.index() + 1;
            node = node.parent;
            value = sibling; // value is now a Node
        }
        node.basicInsert(index, value);
    }
    /* Below this point: these methods are optional */
    * [Symbol.iterator]() { // Make tree iterable
        let i = 0;
        for (let node = this.first; node; node = node.next) {
            for (let i = 0; i < node.childCount; i++) yield node.children[i];
        }
    }
    print() {
        console.log(this.root && this.root.toString());
    }
    verify() {
        // Raise an error when the tree violates one of the required properties
        if (!this.root) return; // An empty tree is fine.
        if (this.root.parent) throw "root should not have a parent";
        // Perform a breadth first traversal
        let q = [this.root];
        while (q.length) {
            if (q[0].isLeaf() && this.first !== q[0]) throw "this.first is not pointing to first leaf";
            let level = [];
            let last = null;
            for (let parent of q) {
                if (!(parent instanceof Node)) throw "parent is not instance of Node";
                if (parent.children.length > this.nodeCapacity) throw "node's children array is too large";
                if (parent.childCount > 0 && parent.childCount * 2 <= parent.children.length) throw "node's fill ratio is too low";
                for (let i = parent.childCount; i < parent.children.length; i++) {
                    if (parent.children[i] !== null) throw "child beyond childCount should be null but is not";
                }
                if (parent.isLeaf()) {
                    if (parent.children[0] !== parent.key) throw "key does not match with first child value";
                    for (let value of parent.children.slice(0, parent.childCount)) {
                        if (value === null) throw "leaf has a null as value";
                        if (value instanceof Node) throw "leaf has a Node as value";
                    }
                } else {
                    if (parent.children[0].key !== parent.key) throw "key does not match with first child's key";
                    for (let node of parent.children.slice(0, parent.childCount)) {
                        if (node === null) throw "internal node has a null as value";
                        if (!(node instanceof Node)) throw "internal node has a non-Node as value";
                        if (node.parent !== parent) throw "wrong parent";
                        if (node.prev !== last) throw "prev link incorrect";
                        if (last && last.next !== node) throw "next link incorrect";
                        if (last && last.children.length + node.children.length <= this.nodeCapacity) {
                            throw "two consecutive siblings have a total number of children that is too small";
                        }
                        if (node.childCount * 2 < this.nodeCapacity) {
                            throw "internal node is too small: " + node;
                        }
                        level.push(node);
                        last = node;
                    }
                }
            }
            if (last && last.next) throw "last node in level has a next reference";
            q = level;
        }
    }
    test(count=100, option=3) {
        // option:
        //     0 = always insert & delete at left side (offset 0)
        //     1 = always insert & delete at right side
        //     2 = always insert & delete at middle
        //     3 = insert & delete at random offsets
        // Create array to perform the same operations on it as on the tree
        let max = count*2;
        let arr = [];
        // Perform a series of insertions
        for (let i = 0; i < count; i++) {
            // Choose random value
            let key = Math.floor(Math.random() * max);
            // Perform same insertion in array and tree
            arr.push(key); 
            this.insert(key);
            // Verify tree consistency and properties
            this.verify();
            // Verify the order of keys in the array is the same as in the tree
            if (arr.sort((a,b) => a-b)+"" !== [...this]+"") throw i + ": tree not same as array";
        }
        // Perform a series of has-calls
        for (let i = 0; i < count; i++) {
            // Choose random update index
            let key = Math.floor(Math.random() * max);
            // Perform same insertion in array and tree
            let has = arr.includes(key);
            if (has !== this.has(key)) {
                throw "has() returns inconsistent result";
            }
            if (!has) {
                this.remove(key); // should not alter the tree
                // Verify the order of keys in the array is the same as in the tree
                if (arr+"" !== [...this]+"") throw i + ": tree not same as array";
            }
        }
        // Perform a series of deletions
        for (let i = arr.length - 1; i >= 0; i--) {
            // Choose random deletion key 
            let key = arr[Math.floor(Math.random() * i)];
            // Perform same deletion in array and tree
            arr.splice(arr.indexOf(key), 1);
            this.remove(key);
            // Verify tree consistency and properties
            this.verify();
            // Verify the order of keys in the array is the same as in the tree
            if (arr+"" !== [...this]+"") throw "tree not same as array";
        }
    }
}

// Perform 1000 insertions, 1000 updates, and 1000 deletions on a tree with node capacity of 8
new Tree(8).test(1000);
console.log("all tests completed");

Supported data types

The data type of the keys can be anything for which comparison operators work. So for strings or numbers it will work. In JavaScript it will also work for objects that have implemented either a valueOf or toString method. For instance, this is already the case for native Date objects. JavaScript will first try the valueOf method and in absence the toString method, which is also defined on the Object prototype.

Considerations

I stored one key in each node, as this is an in-memory solution. Historically, B(+)-trees are used for disk storage, where it is important to keep the number of block reads low. A standard B(+)-tree implementation stores the keys one level higher, in the parent, as an array. That way the search does not have to load each child record, while searching for the right one to choose.

Also, it then does not need to store the key of its first child, as we really only need the keys that separate two consecutive children.

trincot
  • 317,000
  • 35
  • 244
  • 286