I am currently trying to understand and recreate a database. One common way of storing data in a database is using a B+Tree. Data is only stored on so called 'leaf nodes', which are indexed by 'index nodes'. If a full leaf node is inserted into it is split with its content being split equally to the two new nodes. I understand this happens to keep the tree balanced and easily searchable. But I can't quite understand this procedure.
The example is taken from this thread 'splitting a node in b+ tree'.
Let's say we have a B+Tree like this:
[21 30 50]
/ \ \ \
[10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]
And then we insert the key '23'. We would need to split the [21|22|25] node. In order to balance the tree we distribute the keys evenly and arrive at this tree:
[21 30 50]
/ \ \ \
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
||
||
\/
[ 30 ]
/ \
[21 23] [50]
/ \ \ / \
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
In my understanding of B+Trees this leaves the [21|22|-] leaf unable the be filled up to maximum capacity. This is because any key smaller than 21 will be entered into the [10|20|-] leaf and 23 is already a present key.
If we think of leaf nodes as pages as they are present in a database, this lets precious memory and disk space go to waste and more disk operations will be necessary to find values, making the database slower.
I would really appreciate feedback on my thoughts and some help clearing up misunderstandings if they are present.
Thank you very much!