1

I was going through the B-Tree topic in Introduction to Algorithms by Cormen et. al. And I was having a difficulty in implementing the disk-operations of the pseudocode in a real program. This might be the situation because few descriptions of the objects is not clear to me here. So could anyone guide me as, how to proceed.

The following are the excerpts from the text, presenting the situation of the development:

We model disk operations in our pseudocode as follows. Let x be a pointer to an object. If the object is currently in the computer's main memory, then we can refer to the fields of the object as usual: key[x], for example. If the object referred to by x resides on disk, however, then we must perform the operation Disk-Read (x) to read object x into main memory before we can refer to its fields. (We assume that if x is already in main memory, then Disk-Read (x) requires no disk accesses; it is a "no-op.") Similarly, the operation Disk-Write(x) is used to save any changes that have been made to the fields of object x. The typical pattern for working with an object is as follows:

    x <- a pointer to some object 
    Disk-Read (x) 
    operations that access and/or modify the fields of x 
    Disk-Write(x) // Omitted if no fields of x were changed. 
    other operations that access but do not modify fields of x

So it is quite obvious that if x is supposed to store an object which on the disk, we cannot refer to it unless we bring it into main-memory, but how can we have x pointing to an object in the disk in the first place, this is something, I am having problem to implement.

I could not understand how to implement these disk operations in an actual program.

Creating an empty B-tree

To build a B-tree T, we first use B-Tree-Create to create an empty root node. The procedure use an auxiliary procedure Allocate-Node, which allocates one disk page to be used as a new node in O(1) time. (I know to allocate on heap, but here are they talking about allocating directly on disk? Moreover if x holds reference to an object allocated on disk then without bringing it into main-memory we cannot possibly work its attributes as per the previous excerpt). We can assume that a node created by Allocate-Node requires no Disk-Read, since there is as yet no useful information stored on the disk for that node.(If we do not read it, how can we set the attributes of x?)

  В-Tree-Create (T): 
  1 x <- Allocate-Node() 
  2 leaf[x] <- true 
  3 n[x] <- 0 
  4 Disk-Write (x) 
  5 root[T] <- x

I know how to allocate on heap and say use fwrite() in C programming language to write it onto the disk, but how to incorporate the linking in the disk? Should be use ftell() to get the start of an object in file and accordingly make the linking?

I do not quite get how to elegantly write the objects to the disk, maintaining the linked structure. (The text Data Structures Using C and C++ By Aaron M. Tenenbaum et. al. only deals with the topic pictorially without hardcore implementation. And I am yet to take up a formal DBMS course)

Please guide me,if possible. Thank you..

[Note I went through this question as well as this, but the answers suggested contains huge codes without any documentation or vivid comments, googling the stuff yields B-tree maintained in main-memory(which is not what they are designed for). Could anyone just pin point to me the thing in a simpler programming language like C, so that I can understand the sketch of the process without going into the details of hundreds of lines of codes, with far more complicacies like acquiring lock, then releasing lock,defragment,etc.] [Moreover there is a video lecture on the topic here, but alas without the details of the implementation]

Abhishek Ghosh
  • 597
  • 7
  • 18
  • 1
    Obviously the pointers filed are not valid on reading. One technique I use is to have an **array** of nodes (perhaps `realloc` when necessary), and instead of pointers use an **index** into the array. Let's say index `-1` represents the `NULL` pointer. That will survive going to file and back. Just dump the (contiguous) array of nodes to file. – Weather Vane Jul 11 '20 at 19:56
  • @WeatherVane please can you elaborate your approach... are you talking about the "indexed" method of linking objects using array for each attribute(for example) ? – Abhishek Ghosh Jul 11 '20 at 20:01
  • 1
    In the more complicated case where I have to 'free' and 'obtain' nodes in a dynamic situation, I simply put the released node into a linked list of available nodes. When I want a node, I take one from that linked list (if there is one). If not, I take the next available from the array (if there is one). If not, I reallocate a bigger array and take the next node available. – Weather Vane Jul 11 '20 at 20:05
  • 1
    ...the linked list of available nodes, is also linked by the array **index**. There are no pointers, but it does assume that the data content of each node is self-contained. – Weather Vane Jul 11 '20 at 20:12
  • @WeatherVane I want to ask a few things regarding the implementation,suppose `Allocate-Node()` is used to take up a node from pool of available linked list of nodes, in the beginning,and then in `Disk-write()` we write it to the file using `fwrite()` (it is the root initially), then we insert another node and link to root and `Disk-Write()` it using the appropriate offset as per the index of the node...So that we can `Disk-Read()` using those index offsets and get to the actual node... Is it something like that. It would be helpful if you could help with a crude code,pseudocode or picture – Abhishek Ghosh Jul 11 '20 at 20:23
  • 1
    I can't supply code, sorry. If you want a file-based tree because it is too large for memory, you can implement a random-access file to work in a similar way. I'm not sure where you file the released node record indexes though. Perhaps in another file, appended to release, truncated when obtained. Try the memory based idea first. When it works, translate it to file based. – Weather Vane Jul 11 '20 at 20:30

0 Answers0