General-purpose databases that never delete or update data in-place

Question

I'm very much inspired by the approach to data management advocated by Rich Hickey, and implemented in Datomic, where the data is never mutated in-place, all the versions are always preserved and query-able, and the time is a first-class concept.

Of course, there are specialized databases matching that description, like Git, or any other source control system. The question is if there are any (more or less) general-purpose DBMS-es of relational, graph, hierarchical, document or any other flavor that can be effectively used in, say, an eCommerce Web application. Or is Datomic the only choice then?

I think both the BerkeleyDB Java Edition and CouchDB work like that internally. But in both cases, there are "space reclaim" processes that purge old data and I am not sure if the history is really exposed as a first-class concept (as opposed to "just" being used to make transaction isolation work). — Thilo, Nov 22 '12 at 08:02
That's right. I'm using CouchDB right now. The views' map-reduce functions can't access the old versions. — Ivan Krechetov, Nov 22 '12 at 08:05
Also there is Git Ketch which is `a multi-master Git management system that replicates information across multiple Git servers for resilience and scalability.`, add here git extensions for large binary files - and get some storage suitable for some types of applications. — Dzmitry Lahoda, Mar 29 '16 at 14:30
[Apache HBase](https://hbase.apache.org/) does not mutate data in place and previous versions [queryable](http://hbase.apache.org/book.html#versions). — Dzmitry Lahoda, Jun 04 '16 at 07:39
I think [Google Spanner](http://research.google.com/archive/spanner.html) is such database, i.e. `old versions of data are subject to configurable garbage-collection poli- cies; and applications can read data at old timestamps. and F1 maintains a logical history log of all changes, which is written into Spanner itself as part of every transaction. F1 takes full snapshots of data at a timestamp to initialize its data structures, and then reads incremental changes to update them.`. Its spinoff [CockroachDB](https://www.cockroachlabs.com/) may have same characteristics. — Dzmitry Lahoda, Jun 06 '16 at 06:29
[Noms](https://github.com/attic-labs/noms) is versioned, forkable, syncable, append-only database. It is possible to see the entire history of the database — Dzmitry Lahoda, Oct 27 '16 at 09:44
[LiteTree](https://github.com/aergoio/litetree) SQLite with Branches — Dzmitry Lahoda, Aug 29 '18 at 17:07

score 35 · Accepted Answer · edited Apr 16 '14 at 02:57

35

There is an approach to designing systems with an idea of never deleting or mutating data called Event Sourcing. Basically, the idea is to store events (or facts) that change the system state, instead of snapshots of the state. The history of events can be replayed later on to produce a certain purpose-specific projection of what the state at any point in time looked like. Multiple projections built for different purposes can coexist in the system. More information can found on the following web sites:

It's in line with what you are describing, but rather than being just a database model, Event Sourcing and Command Query Responsibility Segregation (CQRS) prescribe a special way of designing the whole system including the database and business logic layers.

There are a few frameworks that follow this approach, such as:

While this does not directly answer your question, it may provide a different perspective on the problem.

edited Apr 16 '14 at 02:57

seanf

6,504
3
42
52

answered Nov 22 '12 at 12:51

Anton Beloglazov

4,939
1
21
9

1

No worries, I'm glad it's useful! – Anton Beloglazov Nov 22 '12 at 13:08
2

Another great article I've found myself is this: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html – Describes a general strategy for storing and querying immutable facts, splitting the data storage into two layers: batch and realtime. Comments are also quite interesting. The author is even writing a book on this topic now: http://manning.com/marz/ – Ivan Krechetov Jan 23 '13 at 08:59
Are these still relational? In heaven I think there's some sort of prolog/sql aka rule-based/relational, immutable database in JavaScript that can run on the client and server (like PouchDB & CouchDB or Meteor). I send it a transaction, and get a callback on successful (consistency) or collision (simultaneous writes) -- and it does JOINs! But, unfortunately, only in heaven... :P It's a lot to ask for, I know. – Ryan Taylor Jun 08 '15 at 22:35
There is [very good description](https://msdn.microsoft.com/en-us/library/dn589792.aspx) of event sourcing. – Dzmitry Lahoda Mar 29 '16 at 15:15
[Event store](https://geteventstore.com/) is `open-source, functional database with Complex Event Processing in JavaScript.` if needed idea realized for JavaScript. – Dzmitry Lahoda Jun 27 '16 at 17:28

score 7 · Answer 2 · edited Jun 20 '20 at 09:12

7

Irmin is a distributed database that follows the same design principles as Git.

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 25 '16 at 18:36

Dzmitry Lahoda

939
1
13
34

General-purpose databases that never delete or update data in-place

2 Answers2