11

Imagine the data structure behind Git. It's like a confluently persistent data structure, except using hash references instead of traditional pointers.

I need Git's data structure, except without any of the working tree and index stuff. And there would be millions of branches, each tracking a handful of other local branches. Commits and merges would occur several thousand times per minute on different threads. Pulls would occur every second.

Between libgit2 and jgit I can use Git's data storage subsystem.

But am I using the right tool for the job? Is there a DB that has git's features, but is faster/more concurrent/scalable/less impedance mismatch? Memory-cached writes would be extremely helpful.

The task:

A collaboratively-edited game. Every player has their own branch, and every change they make to the game world is only applied to their version. Changes are merged back into the 'master' branch by trusted users. Data and source code are often tied together, needing the same branching and merging functionality.

Community
  • 1
  • 1
Lilith River
  • 16,204
  • 2
  • 44
  • 76

4 Answers4

3

Datomic provides a persistent data storage and a built-in time notion.

The core developers even created a sample application that implements a git repository into the database

Cesar Canassa
  • 18,659
  • 11
  • 66
  • 69
1

Although the index/working copy parts of git can be separated out easily enough, git is not designed for merges or commits at the rate of thousands per second on a single machine. The core code is not even threadsafe, for the most part. You will likely need to create some new system for your data (you can still use git for the code, of course, and can also look into generating git commits to represent your data when necessary, etc).

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • So libgit2 isn't thread-safe? It would seem that Git would be inherently thread-safe considering it's data structure. – Lilith River Aug 22 '11 at 19:23
  • Nope! It has cache structures, etc, that are not thread-safe. The core git code was only designed to run in a single-threaded manner as command line tools, after all. – bdonlan Aug 22 '11 at 19:25
  • Snap! Know of a DB that's like Git? – Lilith River Aug 22 '11 at 19:30
  • For the rate of mutation you're talking about, this may be a case where you'll have to roll your own. You'll certainly need to have a faster way to track fast-forwards than traversing history each time, and with thousands of mutations per second you want it to be in-process. – bdonlan Aug 22 '11 at 19:32
  • 1
    libgit2 says that it is thread-safe on the [https://github.com/libgit2/libgit2 home page]. Perhaps it adds synchronization somehow? – Lilith River Sep 03 '11 at 12:01
  • I think this slide deck is pretty clear: https://speakerdeck.com/bkeepers/git-the-nosql-database – Lilith River Dec 30 '13 at 05:50
1

Have a look at how github works it terms of collaboration between and across projects. The key is in the way many users choose to copy, use and replicate the file contents of others, so that the github core repo can do the aggregation.

If you don't have that re-use then the git philosophy probably doesn't match your need. The challenge is to identify your impedance match points and promote it hard. Many folk don't really understand why git works and 'normal' VCS doesn't (that is, why/when did old style VCS work in the first place? - clue Kaolin & linen drawings for the RMS Titanic). git works because it starts with modern computer capabilities.

Philip Oakley
  • 13,333
  • 9
  • 48
  • 71
1

JGit can use JDBC, HBase, Cassandra, Bigtable and more and it's thread safe.

robinr
  • 4,376
  • 2
  • 20
  • 18
  • Do you think it would handle this kind of abuse properly? Which datastore would you suggest? – Lilith River Aug 23 '11 at 00:30
  • For the loads you're looking at, I'd probably look into Bigtable first. Bigtable is what Google uses to store their search stuff, so it can definitely handle the transaction load you're looking to do. – Shauna Aug 23 '11 at 17:10
  • I think you should try it out for yourself, and I believe the regular storage is still the fastest. Also involve yourself with it on the JGit. – robinr Sep 22 '11 at 20:14
  • @Shauna don't you need to _be Google_ in order to use Bigtable? – pqnet May 12 '15 at 08:15
  • @pqnet - For BigTable proper at the time of the original posting, perhaps, though [there are several similar options](http://en.wikipedia.org/wiki/BigTable#Other_similar_software) (including a few from Apache). Google [has since started offering it as a cloud solution](https://cloud.google.com/bigtable/) a la AWS, though, making it now a separate option from, say, HBase. – Shauna May 15 '15 at 17:41