1

git log is a powerful Git command which allows to make queries on a lot of things: file authors, commit description, commit dates etc. Git itself says that metadata is stored in some database. I wasn't able to find what kind of database anywhere.

So I'm wondering what is this database? Is it relational and perhaps I can somehow use SQL in order to perform the queries directly from the database instead of using git log?

EDIT: this question is different from How does git store files? because that question mainly asks about how Git stores committed files while my question asks about metadata storage (metadata != data).

Yos
  • 1,276
  • 1
  • 20
  • 40
  • 2
    Why would you like to do so ? There are plenty of libraries to work with git in your favourited programming language – gogaz Aug 20 '18 at 12:22
  • @gogaz because it will be easier for me to perform complicated queries in SQL rather than using multiple git flags. Also I'm genuinely interested how Git saves the metadata – Yos Aug 20 '18 at 12:23
  • 2
    Git uses compressed files basically, the repository is a combination of metadata-objects and blobs (the actual file versions). The metadata objects are just compressed text files. So there is no SQL database at all. You're best off using a library for it, like libgit2 which is available for many runtimes and languages. – Lasse V. Karlsen Aug 20 '18 at 12:25
  • 1
    Just to take an example, if you create a new repository, create a file, add the file to the index, then commit it, your "database" consists of the commit object (compressed text), a tree object (directory structure, compressed text) and the file blob, as 3 separate files, all under .git\objects – Lasse V. Karlsen Aug 20 '18 at 12:27
  • 3
    So it's not a database in your traditional sense. – Lasse V. Karlsen Aug 20 '18 at 12:29
  • Possible duplicate of [How does git store files?](https://stackoverflow.com/questions/8198105/how-does-git-store-files) – phd Aug 20 '18 at 13:04
  • 1
    Specifically, it is a simple key-value store with the keys being hash IDs. The values stored under these keys are one of four value types: *annotated tag*, *commit*, *tree*, and *blob*. The first three have in their values fields that are defined to contain more keys and the entire set of values always forms a Directed Acyclic Graph or DAG. – torek Aug 20 '18 at 15:40

2 Answers2

2

In that description, Git describes its internal state as a "database" in the most literal sense of the word. From Wikipedia:

A database is an organized collection of data, stored and accessed electronically.

It is not a relational database, or one you can easily query using a query language such as SQL.

You can however access the database programatically. For example using LibGit2Sharp and Linq, if you're on .NET:

using (var repo = new Repository(@"path\to\repo.git"))
{
    // Rev walking
    var commits = repo.Commits
                      .StartingAt("sha")
                      .Where(c => c.Author.Email.Contains("@example.com"))
                      .ToList();
}
CodeCaster
  • 147,647
  • 23
  • 218
  • 272
2

From the git glossary (at https://git-scm.com/docs/gitglossary)

object database - Stores a set of "objects", and an individual object is identified by its object name. The objects usually live in $GIT_DIR/objects/.

The database is the collection of files typically stored under .git/objects/ (or, for a bare repo, just objects/).

This is not an RDBMS. It has no general-purpose query language, but rather it's designed to work with various git tools (and vice versa) to enable fast access to the information git needs.

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52