3

I have many questions about how databases work.

  1. Is there a file somewhere with the data?

  2. If there is something like that, are the records stored together or in separate? Like table cars is all in one file or there's one file for each row.

  3. If there are multiple rows per file, how can databases update different rows in the same table in parallel? e.g. 100 simultaneous update requests coming from different sources.

I've read here that databases usually use binary trees, do they rebuild them loading data from the hard drive for every request then save if modified?

Thanks.

2013Asker
  • 2,008
  • 3
  • 25
  • 36
  • As a developer who also knows little about how this works, +1 for curiosity. – jwalk Aug 31 '13 at 19:41
  • 4
    The answer to all of these is "it depends." There are lots of types of databases, and they're all implemented differently. – Jacob Aug 31 '13 at 19:41
  • Your questions depend on what database you are talking about. SQL, MySQL, MSAccess and the list goes on... Did you have a particular database in mind? – BossRoss Aug 31 '13 at 19:41

2 Answers2

2

As already noted, the answer to all these questions is 'it depends'. There are many implementations, and they all differ in significant details.

  1. One or more files. The organization of these files is part of the 'magic sauce' associated with a particular DBMS.

  2. It varies a bit how the data is stored, but it is probably true to say that there are few or no DBMS that use one file per row for general working. The name management for a few billion (trillion, ...) records doesn't bear thinking about, all else apart.

  3. This is again a key part of the 'magic sauce' for a particular DBMS. They all do it; there are many different ways to do it. SQLite uses radically different techniques from Informix, DB2, Oracle, Sybase, MS SQL Server, MySQL, PostgreSQL, Teradata, ... There are many techniques used; some use locking, others timestamp-based or multi-version concurrency control. There are different serialization properties for different DBMS.

I don't know of a DBMS that uses pure binary trees. Most DBMS provide multiple different organizations, but a very common one is a B+-tree (which is related to, but different from, a binary tree); another are various hash table organizations. One of the key objectives of most DBMS is to avoid having to rebuild things from scratch very often, and another is to avoid reading from disk as often as possible — by using caching and so on. Again, this is 'magic sauce' for a particular DBMS.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • As to B+tree's used by Informix, does the server use indexes when sequentially scanning a table? Most implementations of BASIC provide for opening files in sequential or random access mode. – Joe R. Sep 02 '13 at 23:04
  • It depends on the query — both the conditions and the data to be retrieved. Sometimes the query will scan the index to do the first filtering, only retrieving the full rows for a small number of records identified by the index information (or not even touching the main data at all if it is an index-only query). Sometimes the query will scan the table — or perhaps range-partitioned fragments of a table — to get to the data. There's no simple rule for which will be fastest; it depends on the statistics and best estimates of how much each strategy will cost. – Jonathan Leffler Sep 02 '13 at 23:46
  • Can users bypass the optimizer and define their own detailed query plan? This could be valuable under certain circumstances. – Joe R. Sep 03 '13 at 03:31
  • It depends on the DBMS. You probably never get complete control with any; they'll only execute queries along the lines that they allow. You can influence, but not necessarily control, how the query is executed with hints — the details vary enormously between DBMS. In general, though, you can't define your own detailed query plan. – Jonathan Leffler Sep 03 '13 at 04:51
0

All the writes to a file on a database are buffered and they are written to the file once every few seconds (in a defined interval) in a synchronous way.

Oracle uses redo log to keep track for updates or inserts before they are committed. There is a redo log buffer which is in-memory for a define interval like 3 seconds and every 3 seconds the writes to the file as a single process.

Mouli
  • 194
  • 7