0

According to the docs, each PostgreSQL table is stored in a separate file. When a table exceeds 1 GB, it is divided into gigabyte-sized segments. (the same is more or less true for others rdbms )

Let's say I'm updating row which is located in the middle of the file. There is no way to write in the middle of the file without losing second part. If we split file into 2 part, write some data, second part of the file should by copied somewhere (in memory or into another file) and then appended to the first part of file. What rdbms does internally in this case ?

Maybe instead of modifying whole file rdbms append modified row with newer version to the file and keep old record unmodifed. If it's true, I have same questions about indexes. What if b-tree insertion occurs, index file must be modified in an arbitrary place of the file. Does rdbms rewrite whole index file on each update/insert ?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
user12384512
  • 3,362
  • 10
  • 61
  • 97
  • In gerneral, RDBMS try to keep the index in RAM – Basile Starynkevitch Jul 29 '17 at 19:01
  • 1
    This isn't a good question for Stack Overflow as it has multiple questions within and isn't clear what it is asking. – AStopher Jul 29 '17 at 19:02
  • 1
    No, the index is not re-written with every change - that's why it gets fragmented and needs to be reorganized or rebuilt from time to time. – Filburt Jul 29 '17 at 19:06
  • 3
    `There is no way to write in the middle of the file without losing second part.` -- You probably learned this rule when you were working with text files. It doesn't apply to file formats that are designed to be efficiently updateable. –  Jul 29 '17 at 19:15
  • @WumpusQ.Wumbley Misterios file formats designed by NASA ....You cannot insert in the middle of the file https://stackoverflow.com/questions/30642827/how-to-write-to-middle-of-a-file-in-c – Anton Jul 29 '17 at 19:26
  • 1
    Postgres did not modify files on each update. Updated and deleted rows are not hysically removed. From docs - VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Not sure how they deal with indexes. – Anton Jul 29 '17 at 19:30

1 Answers1

0

This is too long for a comment.

Your fixation on the segment size is amusingly misplaced. The use of "segments" to break up individual files is not particularly common across databases. The alternative mechanism of storing data in table spaces assigned to one or more files, with one physical file storing pages from multiple tables is also common. And there are other alternatives as well (consider columnar data stores). That is the least important part of the storage mechanism in Postgres or any other database.

In general, relational databases store tables on pages. The page layout for Postgres is described in the documentation. The most important thing is that these are fixed sized blocks that contain rows (or parts of rows).

When pages are referenced in queries, they are read into memory "1" at a time -- the "1" is in quotes because adjacent pages are often read in as a block. These pages are managed in memory caches, while needed by queries, sometimes being swapped out or written to disk as necessary.

Pages can have empty space in them. Segments can have empty pages. When you insert a record, it may go at the end of the last page or on some earlier page, depending on where there is space and other considerations.

Indexes are also organized on pages -- in fact, using the same page layout (although the terminology is slightly different). The mechanics of inserting a row in an index can involve the creation of new pages ("page splits") or simply inserting a row on an existing page.

This is a high-level description. Postgres is a pretty open database, where these details are pretty well documented. Other databases do similar things, although the details do differ a bit from one database to another.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for your comment. You are talking about logical structure, I'm talking about physical modification of the file in the file system. It's not about pages at all, I did not mention pages to simply the question. Pages is the logical entity. As I mentioned in my question, there is no way to physically modify file in the arbitrary place. Hence on every modification (index file or table file) file somehow should be rewritten from scratch – user12384512 Jul 29 '17 at 19:12
  • @user12384512 . . . Pages are not logical. They are (basically) the unit of I/O between disk and memory and the unit of updates and inserts. Postgres most decidedly does not write a 1 Gbyte file every time a single row is updated. – Gordon Linoff Jul 29 '17 at 19:15
  • Postgres most decidedly did not rewrite whole file, most likely it appends new version of the row at the end of the file.This is my guess. Pages mentioned by you is logical unit. In Postgres pages are just 8kb piece of data, nothing more. – user12384512 Jul 29 '17 at 19:21