2

I am developing an application which uses Git as a database. My current approach is to call out to the git shell command to construct a new commit whenever the application changes something. This is very simple, but a big disadvantage is that it does not allow any concurrent writes to the database: two threads cannot construct a commit simultaneously, because there is a single HEAD, a single index, and a single working copy.

However, since commits, trees, and blobs are all content-addressed, I think it should be possible to construct all of these concurrently. What would be the recommended approach for this? Perhaps:

  • command line flags to git add, git commit, etc., which explicitly specify a different HEAD and index file to use. As far as I can see, such things do not exist.
  • using git plumbing commands for all operations. However, I am not an expert with them and am not totally sure which are thread-safe.
  • a Git service, to which one can connect a la traditional database connections, which would provide transactional, concurrent access to a Git repository. As far as I can see, such a thing does not exist. I have considered writing one.
  • giving up and doing a git clone to get an entirely new working copy for each concurrent user. This is inordinately expensive.
jameshfisher
  • 34,029
  • 31
  • 121
  • 167

2 Answers2

3

Since 2014, there have been several initiatives to propose a "Git-like" database:

The most recent is liquidata-inc/dolt: "Git for data"

Dolt is a relational database, i.e. it has tables, and you can execute SQL queries against those tables.

It also has version control primitives that operate at the level of table cell. Thus Dolt is a database that supports fine grained value-wise version control, where all changes to data and schema are stored in commit log.


Before, with a different approach: SOM-Research/Gitana (2017).

See "A conceptual and database schema for Git via Gitana" by Valerio Cosentino (Twitter)


Closer to what you are looking for is src-d/gitbase

SQL interface to git repositories, written in G

It can be used to perform SQL queries about the Git history and about the Universal AST of the code itself.

All three projects can include ideas about how to use Git for this kind of database usage.


Note also that recent version of Git (including the upcoming Q2 2020 Git 2.27) have improved the git push --atomic.

The same Git 2.27 is in the process of implementing two-phase commit-style atomic ref-updates across multiple repositories: see "Is it possible to manage multiple repositories coherently?".

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Nice answer, thanks! Strictly speaking, it seems like the answer is that there's still no straightforward way to do concurrent "transactions" on a git repo as you would with a database, e.g. if you wanted to host your database on GitHub. But it definitely sounds like the tools and options have improved. – jameshfisher May 21 '20 at 18:39
  • 1
    @jameshfisher I agree. Git 2.27 makes progress toward that notion of "transaction", but even then, it won't be straightforward indeed. – VonC May 21 '20 at 19:36
0

Another possible option is for each concurrent user to create (and checkout) a new branch and commit to that branch, and then to merge the branches to the master branch periodically.

That might have some problems though :

  1. merge might fail and require interactive intervention to resolve conflicts.
  2. until the periodic merge occurs, user Y won't see the changes commited by user X.
Eran
  • 387,369
  • 54
  • 702
  • 768
  • I'm not sure how this helps -- It seems like committing to two different branches concurrently in the same repository is impossible, at least with the git porcelain commands. I'm also not addressing the problems of merge conflicts or branch updates at this stage; just the problem of how to construct commits concurrently. – jameshfisher Jul 20 '14 at 22:18