0

How to determine if changes have been commited since the previous git bundle was created without creating a working repository, looping thru every branch and recording every head revision?

One of the short falls I'm finding with GIT is proper backup support for use in the enterprise. The enterprise differs from open source development in that there is always 1) an authoritative repository and 2) a backup system handling very large amounts of data. Thus there is motivation to both 1) backup very frequently and 2) only run the backup process when there are new changes. My problem is finding a solution for #2.

I'm using git bundle to create my archives but I'm not finding a conclusive way to determine whether new changes have been commit-ed since the previous backup.

I've been trying to find a combination of options for git rev-list to list new commit ids since the last bundle, but have been unsuccessful. A query on this topic reveals a very nice backup script written using:

git -C "${path}" rev-parse --short=10 HEAD

to mark the bundle with a commit id. That solution inadequately describes a snapshot of a git repository as other branches may have been updated leaving the HEAD revision of an upstream repository unaltered.

I've looked at using --max-age=<lastbackup epoch>, but quickly found that its possible for a developer to push older changes after a backup has run, and since the dates for the commits do not change, the result is that they are older than the last backup date and thus a backup is not triggered.

The best approach I have so far is:

git -C ${repo} rev-list -a --branches ${prev_commit}..HEAD

which does capture new revisions from other branches, but will continue to report revisions on other branches even after a newer commit has been made to HEAD.

I have not started looking into incremental backups yet, but I can already see that in order to verify one, I would need to create and manage a working repository when I prefer to just maintain bare repositories on our server.

Also I'll note that I have not found an option to git branch to remove the "*" so it will just give me a clean list of branches for scripting.

What are other enterprises doing to backup their repositories?

Community
  • 1
  • 1
brookbot
  • 398
  • 1
  • 3
  • 11
  • 1
    Are you using a Git provider such as GitHub or Bitbucket? Both of these backup your data behind the scenes AFAIK. – Tim Biegeleisen Jan 18 '17 at 01:42
  • Without commenting on backup strategies, I will note that `git rev-list` is the wrong tool. Use `git for-each-ref` to obtain the current values of some or all references. – torek Jan 18 '17 at 02:09
  • @TimBiegeleisen using a service provider is not an option. Think enterprise, IP, etc. – brookbot Jan 19 '17 at 02:05
  • @torek This looks good, does it also capture tags? – brookbot Jan 19 '17 at 02:10
  • @btpw: `git for-each-ref` operates on either all refs (branches, tags, the stash, notes, remote-tracking branches, `refs/original/` from filter-branch, etc) or the refs you specify. If you want tags, run it on `refs/tags`. Read the documentation for details; for tags you will probably want not just tag object but also tag object's non-tag target object-ID. – torek Jan 19 '17 at 02:55
  • So for a solution I am recording and then diff'ing the output of `git -C ${repo} for-each-ref` with no args and this seems to be working well. I believe that its giving me the latest OID for each branch and should cover other OIDs @torek mentions above. Thanks. – brookbot Jan 21 '17 at 02:00

1 Answers1

0

You can use git -C "${path}" rev-parse --short=10 --branches instead. Even it shows fatal: Needed a single revision in the end of the output, but it also can display the latest commit for each updated branches.

As you use git to do version control, you just need the git server or third party hosted server (such as github, bitbucket etc) to manage the different version. It’s really convenient and time saving, and you don’t need to concern what’s the version now. The advantage is the commit histories can't lost, so in this way you don't need to do archiving any more.

Marina Liu
  • 36,876
  • 5
  • 61
  • 74
  • Note that `--branches` will get the current object IDs for each branch name (these OIDs are necessarily commits), but will not get OIDs for tags (these may be annotated tags so you may need both the tag OID and the peeled tag OID). If your goal is really to save *everything*, you should consider the fact that a tag can protect objects that no branch name protects. (For instance you might point a tag at a blob containing an essential GPG signature.) You might also be concerned with notes, which are commits not in the branch name-space. – torek Jan 18 '17 at 17:52
  • Thanks, but I'm looking for an enterprise solution so using github, or bitbucket services are not an option. How do these services provide backups of git repos, or are they just relying on a general disk (net appliance) backup? – brookbot Jan 19 '17 at 02:00
  • If you use github or bitbucket, you don’t need to back up. They store the repos on their server. what the users need to do is clone the repo to local and checkout to different versions for their need. – Marina Liu Jan 19 '17 at 02:06