3

I often craft a lot of experimental code that I ultimately throw away. During that time, they live in temporary Git repos that I later blow away.

Alternatively, perhaps I should create a branch, do my experiments there, and then delete the branch. But does the space that the branch occupies ever get released, or is that history preserved to the end of time?

Sometime, the remote repo is on a company server which I do not control. So adding or removing repos tends to be a heavyweight IT-based operation.

Rick 0xfff
  • 226
  • 3
  • 5
  • I'm going to let the original question stand as is because I think the answers that came up, particularly the detailed one by @torek are useful. The git documentation itself talks about branches and includes "git branch --delete" as a command and option. The intent was to find out what happens to the commits in that branch that are not used elsewhere. Some sort of garbage collection is what I was looking for, but had never run across. – Rick 0xfff May 16 '18 at 14:09

3 Answers3

7

You need to define a "dead branch". Better yet, start by figuring out what you mean when you say "a branch"—see What exactly do we mean by "branch"?

As bmargulies noted, if a commit has no references, it will eventually be garbage-collected. So a more precise question is: When does a commit have references?

If you are familiar with Lisp or any of the more modern garbage-collected languages (including Go, Java, and Python), you have a big head start here. If not, read the Wikipedia page. Note that general-purpose language collectors have to deal with cycles in the object graph, which create problems for simple reference-counting collectors such as that in the CPython implementation. The Git object graph is by definition acyclic, so reference counts would work here, but Git still uses a standard mark-and-sweep technique. This allows the objects to be read-only once created: there's no need to keep and update reference counts. Git simply marks initially-referenced objects, then traverses the graph to copy the marks to objects referenced from those objects.

In particular, each commit in Git lists the hash ID of some set of parent commits—usually just one, but for merges, two or more, and for root commits, no parents. So Git starts with all external references—all the object hash IDs that are directly reachable from outside the internal graph—and then, for each object that is a commit object, marks its parent(s), the parents' parent(s), and so on.

In this particular case, of garbage-collecting the entire repository database, Git also marks each tree object and, recursively, each object reachable from a tree. This marks all the used blob objects. Git marks each directly-reachable annotated tag, plus the object to which the annotated-tag object itself points, and, recursively, any objects reachable from that object (an annotated tag can point to any of the four kinds of objects).

Having marked every reachable object, all remaining objects are by definition unreachable. Git can eject those objects from the repository, rebuild the compressed pack files that store objects with full compression applied, and then remove any stale loose objects (which are only zlib-compressed—the full compression in the pack files does delta encoding as well).

But we're still stuck with the question of what makes an object externally reachable, and this is where branch names, and in fact all names, come in. Branch names exist within the refs/heads/ namespace; tag names live in refs/tags/; remote-tracking names are stored under refs/remotes/, and there are others. Collectively, these names are called references, and they all share the ability to store one single hash ID each.

Git also stores external references in:

  • reflogs, which retain previous values of reference names;
  • HEAD, when it is detached, and the reflog for HEAD (HEAD is sometimes considered a reference and sometimes not);
  • the other special HEAD files such as ORIG_HEAD, MERGE_HEAD, and CHERRY_PICK_HEAD;
  • the index, which normally contains blob references; and
  • added worktree index files.

If the only reference to some commit is some other commit, and that other commit's only reference is a branch name and its reflog entries, and you delete the branch name, then at that point these two commits are now unreferenced. They are eligible for garbage collection. There are a few extra safety nets: their hash IDs might be stored in the HEAD reflog, for instance. If they are loose objects (not yet packed), they have a grace period, 14 days by default, from the time they were created before they will be removed. This grace period means that Git commands have up to 14 days to complete their operation, writing a reference that keeps a new loose object alive, even if a garbage-collection process has started.

Reflog entries eventually expire, so once you have deleted a branch name, commits that are unique to that branch will live no longer than any HEAD reflog entry (30 days by default) or the 14-day prune grace period, whichever is longer. After that, the commits, along with any other objects (trees and blobs) whose existence is predicated on the continued existence of those commits, are ready for removal, and the next garbage collection—manual or automatic—will remove them.

torek
  • 448,244
  • 59
  • 642
  • 775
3

No. A branch is just a label on a commit. There is no 'history' for a branch. Someone with enough access can remove it.

If your concern is the commits that make up the branch, if the there are no refs, eventually they can be gc'ed.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
0

If you delete a branch the pointers to those commits still exist somewhere, the code that was merged in from deleted branches still exist. But as time goes on Garbage Collection will make deleted branches "unrecoverable". There are enterprise level tools that can help with recovering a deleted branch.

Possible Duplicate of: Does deleting a branch in git remove it from the history?

Nicholas Koskowski
  • 793
  • 1
  • 4
  • 23