193

Given the hash of a blob, is there a way to get a list of commits that have this blob in their tree?

svick
  • 236,525
  • 50
  • 385
  • 514
readonly
  • 343,444
  • 107
  • 203
  • 205
  • 3
    "Hash of a blob" is that returned by `git hash-object` or `sha1("blob " + filesize + "\0" + data)`, and not simply the sha1sum of the blob contents. – Ivan Hamilton Mar 16 '15 at 13:23
  • 1
    I originally thought this question matched my question, but it seems it does not. I want to know the *one* commit which first *introduced* this blob to the repository. – Jesse Glick Sep 25 '15 at 13:23
  • 1
    If you know the filepath, you can use `git log --follow filepath` (and use this to speed up Aristotle's solution, if you want). – Zaz Oct 18 '16 at 13:26
  • ProTip™: Put one of the belew scripts in `~/.bin`and name it `git-find-object`. You can then use it with `git find-object`. – Zaz Oct 18 '16 at 16:58
  • 1
    Note: With Git 2.16 (Q1 2018), you could consider simply `git describe `: See [my answer below](https://stackoverflow.com/a/48027778/6309). – VonC Dec 29 '17 at 19:55

8 Answers8

117

Both of the following scripts take the blob’s SHA1 as the first argument, and after it, optionally, any arguments that git log will understand. E.g. --all to search in all branches instead of just the current one, or -g to search in the reflog, or whatever else you fancy.

Here it is as a shell script – short and sweet, but slow:

#!/bin/sh
obj_name="$1"
shift
git log "$@" --pretty=tformat:'%T %h %s' \
| while read tree commit subject ; do
    if git ls-tree -r $tree | grep -q "$obj_name" ; then
        echo $commit "$subject"
    fi
done

And an optimised version in Perl, still quite short but much faster:

#!/usr/bin/perl
use 5.008;
use strict;
use Memoize;

my $obj_name;

sub check_tree {
    my ( $tree ) = @_;
    my @subtree;

    {
        open my $ls_tree, '-|', git => 'ls-tree' => $tree
            or die "Couldn't open pipe to git-ls-tree: $!\n";

        while ( <$ls_tree> ) {
            /\A[0-7]{6} (\S+) (\S+)/
                or die "unexpected git-ls-tree output";
            return 1 if $2 eq $obj_name;
            push @subtree, $2 if $1 eq 'tree';
        }
    }

    check_tree( $_ ) && return 1 for @subtree;

    return;
}

memoize 'check_tree';

die "usage: git-find-blob <blob> [<git-log arguments ...>]\n"
    if not @ARGV;

my $obj_short = shift @ARGV;
$obj_name = do {
    local $ENV{'OBJ_NAME'} = $obj_short;
     `git rev-parse --verify \$OBJ_NAME`;
} or die "Couldn't parse $obj_short: $!\n";
chomp $obj_name;

open my $log, '-|', git => log => @ARGV, '--pretty=format:%T %h %s'
    or die "Couldn't open pipe to git-log: $!\n";

while ( <$log> ) {
    chomp;
    my ( $tree, $commit, $subject ) = split " ", $_, 3;
    print "$commit $subject\n" if check_tree( $tree );
}
torek
  • 448,244
  • 59
  • 642
  • 775
Aristotle Pagaltzis
  • 112,955
  • 23
  • 98
  • 97
  • 9
    FYI you have to use the full SHA of the blob. A prefix, even if unique, will not work. To get the full SHA from a prefix, you can use `git rev-parse --verify $theprefix` – John Douthat Aug 02 '11 at 23:05
  • 1
    Thanks @JohnDouthat for this comment. Here's how to incorporate that into above script (sorry for the inlining in comments): `my $blob_arg = shift; open my $rev_parse, '-|', git => 'rev-parse' => '--verify', $blob_arg or die "Couldn't open pipe to git-rev-parse: $!\n"; my $obj_name = <$rev_parse>; chomp $obj_name; close $rev_parse or die "Couldn't expand passed blob.\n"; $obj_name eq $blob_arg or print "(full blob is $obj_name)\n";` – Ingo Karkat May 17 '12 at 19:40
  • There may be bug in the upper shell script. The while loop only executes if there are more lines to read, and for whatever reason git log is not putting a final crlf on the end. I had to add a linefeed and ignore blank lines. `obj_name="$1" shift git log --all --pretty=format:'%T %h %s %n' -- "$@" | while read tree commit cdate subject ; do if [ -z $tree ] ; then continue fi if git ls-tree -r $tree | grep -q "$obj_name" ; then echo "$cdate $commit $@ $subject" fi done` – Mixologic Jun 18 '13 at 21:01
  • 9
    This only finds commits _on the current branch_ unless you pass `--all` as an additional argument. (Finding all commits repo-wide is important in cases like [deleting a large file from the repo history](http://git-scm.com/book/en/Git-Internals-Maintenance-and-Data-Recovery#Removing-Objects)). – peterflynn Jul 11 '13 at 05:21
  • 1
    Tip: pass the -g flag to the shell script (after the object ID) to examine the reflog. – Bram Schoenmakers Sep 18 '14 at 07:18
  • The `git log` options in the shell script are incorrect. Use tformat: to terminate each output line with an LF, as in `git log '--pretty=tformat:%T %h %s' -- "$@"`. The subsequent `read` requires each line to be terminated with LF. – Markus Kuhn Dec 12 '16 at 13:38
36

For humans, the most useful command is probably

git whatchanged --all --find-object=<blob hash>

This shows, across --all branches, any commits that added or removed a file with that hash, along with what the path was.

git$ git whatchanged --all --find-object=b3bb59f06644
commit 8ef93124645f89c45c9ec3edd3b268b38154061a 
⋮
diff: do not show submodule with untracked files as "-dirty"
⋮
:100644 100644 b3bb59f06644 8f6227c993a5 M      submodule.c

commit 7091499bc0a9bccd81a1c864de7b5f87a366480e 
⋮
Revert "submodules: fix of regression on fetching of non-init subsub-repo"
⋮
:100644 100644 eef5204e641e b3bb59f06644 M  submodule.c

Note that git whatchanged already includes the before-and-after blob hashes in its output lines.

andrewdotn
  • 32,721
  • 10
  • 101
  • 130
  • From what git version does the --find-object exist? I'm trying on 2.30.2 with no luck. – exa May 07 '21 at 10:16
  • @exa That’s odd, it should be in [2.17 and up](https://github.com/git/git/commit/15af58c1adba431c216e2a45fa0d22944560ba02). – andrewdotn May 07 '21 at 18:34
  • 1
    After some search I found that it was my mistake (+ broken completion mistake). All working right, sorry! :D – exa May 08 '21 at 19:06
  • 3
    Note: `git whatchanged` is semi-deprecated; it basically means `git log --raw --no-merges`, which is not semi-deprecated. – torek May 19 '22 at 18:28
  • My git v2.36.1 works fine. This is the right answer. – cupen Jun 08 '22 at 09:52
34

Unfortunately scripts were a bit slow for me, so I had to optimize a bit. Luckily I had not only the hash but also the path of a file.

git log --all --pretty=format:%H -- <path> | xargs -I% sh -c "git ls-tree % -- <path> | grep -q <hash> && echo %"
Alan
  • 12,952
  • 3
  • 18
  • 13
aragaer
  • 17,238
  • 6
  • 47
  • 49
  • 1
    Excellent answer because it is so simple. Just by making the reasonable assumption that the path is known. However, one should know that it returns the commit where the path was changed to the given hash. – Unapiedra Jul 21 '17 at 19:16
  • 2
    If one wants the newest commit containing the `` at the given ``, then removing the `` argument from the `git log` will work. The first returned result is the wanted commit. – Unapiedra Jul 21 '17 at 19:24
22

In addition to git describe, that I mention in my previous answer, git log and git diff now benefits as well from the "--find-object=<object-id>" option to limit the findings to changes that involve the named object.
That is in Git 2.16.x/2.17 (Q1 2018)

See commit 4d8c51a, commit 5e50525, commit 15af58c, commit cf63051, commit c1ddc46, commit 929ed70 (04 Jan 2018) by Stefan Beller (stefanbeller).
(Merged by Junio C Hamano -- gitster -- in commit c0d75f0, 23 Jan 2018)

diffcore: add a pickaxe option to find a specific blob

Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? Or this Stack Overflow question "Which commit has this blob?")

One might be tempted to extend git-describe to also work with blobs, such that git describe <blob-id> gives a description as '<commit-ish>:<path>'.
This was implemented here; as seen by the sheer number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that could be the commit that (re-)introduced the blob or the blob that removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this patch implements.
Teach the diff machinery another flag for restricting the information to what is shown.
For example:

$ ./git log --oneline --find-object=v2.0.0:Makefile
  b2feb64 Revert the whole "ask curl-config" topic for now
  47fbfde i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was appeared in v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b.
The reason why these commits both occur prior to v2.0.0 are evil merges that are not found using this new mechanism.


As noted in the comments by marcono1234, you can combine that with the git log --all option:

this can be useful when you don't know which branch contains the object.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • `git log` also has an [`--all`](https://git-scm.com/docs/git-log#Documentation/git-log.txt---all) option, this can be useful when you don't know which branch contains the object. – Marcono1234 Sep 09 '21 at 23:13
  • @Marcono1234 Good point, thank you. I have included your comment in the answer for more visibility. – VonC Sep 10 '21 at 06:18
16

Given the hash of a blob, is there a way to get a list of commits that have this blob in their tree?

With Git 2.16 (Q1 2018), git describe would be a good solution, since it was taught to dig trees deeper to find a <commit-ish>:<path> that refers to a given blob object.

See commit 644eb60, commit 4dbc59a, commit cdaed0c, commit c87b653, commit ce5b6f9 (16 Nov 2017), and commit 91904f5, commit 2deda00 (02 Nov 2017) by Stefan Beller (stefanbeller).
(Merged by Junio C Hamano -- gitster -- in commit 556de1a, 28 Dec 2017)

builtin/describe.c: describe a blob

Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or this very SO question "Which commit has this blob?")

When describing commits, we try to anchor them to tags or refs, as these are conceptually on a higher level than the commit. And if there is no ref or tag that matches exactly, we're out of luck.
So we employ a heuristic to make up a name for the commit. These names are ambiguous, there might be different tags or refs to anchor to, and there might be different path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer as well, which is a tuple of (commit, deep/path) as the tree objects involved are rather uninteresting.
The same blob can be referenced by multiple commits, so how we decide which commit to use?

This patch implements a rather naive approach on this: As there are no back pointers from blobs to commits in which the blob occurs, we'll start walking from any tips available, listing the blobs in-order of the commit and once we found the blob, we'll take the first commit that listed the blob.

For example:

git describe --tags v0.99:Makefile
conversion-901-g7672db20c2:Makefile

tells us the Makefile as it was in v0.99 was introduced in commit 7672db2.

The walking is performed in reverse order to show the introduction of a blob rather than its last occurrence.

That means the git describe man page adds to the purposes of this command:

Instead of simply describing a commit using the most recent tag reachable from it, git describe will actually give an object a human readable name based on an available ref when used as git describe <blob>.

If the given object refers to a blob, it will be described as <commit-ish>:<path>, such that the blob can be found at <path> in the <commit-ish>, which itself describes the first commit in which this blob occurs in a reverse revision walk from HEAD.

But:

BUGS

Tree objects as well as tag objects not pointing at commits, cannot be described.
When describing blobs, the lightweight tags pointing at blobs are ignored, but the blob is still described as <committ-ish>:<path> despite the lightweight tag being favorable.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 3
    Good to use in conjunction with `git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | awk '/^blob/ {print substr($0,6)}' | sort --numeric-sort --key=2 -r | head -n 20`, which returns you a top 20 largest blobs. Then you can pass blob ID from the above output to `git describe`. Worked as a charm! Thanks! – Alexander Pogrebnyak Dec 02 '19 at 19:01
7

I thought this would be a generally useful thing to have, so I wrote up a little perl script to do it:

#!/usr/bin/perl -w

use strict;

my @commits;
my %trees;
my $blob;

sub blob_in_tree {
    my $tree = $_[0];
    if (defined $trees{$tree}) {
        return $trees{$tree};
    }
    my $r = 0;
    open(my $f, "git cat-file -p $tree|") or die $!;
    while (<$f>) {
        if (/^\d+ blob (\w+)/ && $1 eq $blob) {
            $r = 1;
        } elsif (/^\d+ tree (\w+)/) {
            $r = blob_in_tree($1);
        }
        last if $r;
    }
    close($f);
    $trees{$tree} = $r;
    return $r;
}

sub handle_commit {
    my $commit = $_[0];
    open(my $f, "git cat-file commit $commit|") or die $!;
    my $tree = <$f>;
    die unless $tree =~ /^tree (\w+)$/;
    if (blob_in_tree($1)) {
        print "$commit\n";
    }
    while (1) {
        my $parent = <$f>;
        last unless $parent =~ /^parent (\w+)$/;
        push @commits, $1;
    }
    close($f);
}

if (!@ARGV) {
    print STDERR "Usage: git-find-blob blob [head ...]\n";
    exit 1;
}

$blob = $ARGV[0];
if (@ARGV > 1) {
    foreach (@ARGV) {
        handle_commit($_);
    }
} else {
    handle_commit("HEAD");
}
while (@commits) {
    handle_commit(pop @commits);
}

I'll put this up on github when I get home this evening.

Update: It looks like somebody already did this. That one uses the same general idea but the details are different and the implementation is much shorter. I don't know which would be faster but performance is probably not a concern here!

Update 2: For what it's worth, my implementation is orders of magnitude faster, especially for a large repository. That git ls-tree -r really hurts.

Update 3: I should note that my performance comments above apply to the implementation I linked above in the first Update. Aristotle's implementation performs comparably to mine. More details in the comments for those who are curious.

Community
  • 1
  • 1
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • Hmm, how can it be *that* much faster? You’re walking the tree anyway, aren’t you? What work does git-ls-tree do that you avoid? (NB.: grep will bail on first match, SIGPIPE’ing the git-ls-tree.) When I tried it, I had to Ctrl-C your script after 30 seconds; mine was done in 4. – Aristotle Pagaltzis Oct 22 '08 at 00:01
  • 1
    My script caches the results of subtrees in the %trees hash, so it doesn't have to keep searching subtrees that haven't changed. – Greg Hewgill Oct 22 '08 at 00:05
  • Actually, I was trying the implementation I found on github that I linked to. Yours is faster in some cases, but it depends highly on whether the file you're looking for is at the beginning or end of the ls-tree list. My repository has 9574 files in it right now. – Greg Hewgill Oct 22 '08 at 00:10
  • It also occurs to me that some nonlinear project histories might cause my script to do much more work than it needs to (this can be fixed). This might be why it took a long time to run for you. My respository is a git-svn mirror of a Subversion repository, so it's nicely linear. – Greg Hewgill Oct 22 '08 at 00:17
  • Instead of parsing cat-file to get the tree just do `git rev-parse $commit^{}` – jthill Mar 06 '13 at 19:18
  • Note that this only finds commits _on the current branch_ (or a list of specified branches if you pass them in as args) - it doesn't exhaustively search the entire repo. Aristotle's answer above can search repo-wide given the right arg. (Which is useful e.g. for [thoroughly deleting a large file from the repo history](http://git-scm.com/book/en/Git-Internals-Maintenance-and-Data-Recovery#Removing-Objects)). – peterflynn Jul 11 '13 at 05:24
  • You might want to handle this error properly: `$ git locate-blob 123456 HEAD` `fatal: Not a valid object name 123456` `Use of uninitialized value $tree in pattern match (m//) at line 33.` – user541686 Jun 02 '19 at 02:11
6

While the original question does not ask for it, I think it is useful to also check the staging area to see if a blob is referenced. I modified the original bash script to do this and found what was referencing a corrupt blob in my repository:

#!/bin/sh
obj_name="$1"
shift
git ls-files --stage \
| if grep -q "$obj_name"; then
    echo Found in staging area. Run git ls-files --stage to see.
fi

git log "$@" --pretty=format:'%T %h %s' \
| while read tree commit subject ; do
    if git ls-tree -r $tree | grep -q "$obj_name" ; then
        echo $commit "$subject"
    fi
done
Mario
  • 2,229
  • 2
  • 21
  • 21
  • 3
    I'd just like to give credit where it's due: thank you RAM corruption for causing me a BSOD and forcing me to hand repair my git repo. – Mario Mar 17 '12 at 08:20
4

So... I needed to find all files over a given limit in a repo over 8GB in size, with over 108,000 revisions. I adapted Aristotle's perl script along with a ruby script I wrote to reach this complete solution.

First, git gc - do this to ensure all objects are in packfiles - we don't scan objects not in pack files.

Next Run this script to locate all blobs over CUTOFF_SIZE bytes. Capture output to a file like "large-blobs.log"

#!/usr/bin/env ruby

require 'log4r'

# The output of git verify-pack -v is:
# SHA1 type size size-in-packfile offset-in-packfile depth base-SHA1
#
#
GIT_PACKS_RELATIVE_PATH=File.join('.git', 'objects', 'pack', '*.pack')

# 10MB cutoff
CUTOFF_SIZE=1024*1024*10
#CUTOFF_SIZE=1024

begin

  include Log4r
  log = Logger.new 'git-find-large-objects'
  log.level = INFO
  log.outputters = Outputter.stdout

  git_dir = %x[ git rev-parse --show-toplevel ].chomp

  if git_dir.empty?
    log.fatal "ERROR: must be run in a git repository"
    exit 1
  end

  log.debug "Git Dir: '#{git_dir}'"

  pack_files = Dir[File.join(git_dir, GIT_PACKS_RELATIVE_PATH)]
  log.debug "Git Packs: #{pack_files.to_s}"

  # For details on this IO, see http://stackoverflow.com/questions/1154846/continuously-read-from-stdout-of-external-process-in-ruby
  #
  # Short version is, git verify-pack flushes buffers only on line endings, so
  # this works, if it didn't, then we could get partial lines and be sad.

  types = {
    :blob => 1,
    :tree => 1,
    :commit => 1,
  }


  total_count = 0
  counted_objects = 0
  large_objects = []

  IO.popen("git verify-pack -v -- #{pack_files.join(" ")}") do |pipe|
    pipe.each do |line|
      # The output of git verify-pack -v is:
      # SHA1 type size size-in-packfile offset-in-packfile depth base-SHA1
      data = line.chomp.split(' ')
      # types are blob, tree, or commit
      # we ignore other lines by looking for that
      next unless types[data[1].to_sym] == 1
      log.info "INPUT_THREAD: Processing object #{data[0]} type #{data[1]} size #{data[2]}"
      hash = {
        :sha1 => data[0],
        :type => data[1],
        :size => data[2].to_i,
      }
      total_count += hash[:size]
      counted_objects += 1
      if hash[:size] > CUTOFF_SIZE
        large_objects.push hash
      end
    end
  end

  log.info "Input complete"

  log.info "Counted #{counted_objects} totalling #{total_count} bytes."

  log.info "Sorting"

  large_objects.sort! { |a,b| b[:size] <=> a[:size] }

  log.info "Sorting complete"

  large_objects.each do |obj|
    log.info "#{obj[:sha1]} #{obj[:type]} #{obj[:size]}"
  end

  exit 0
end

Next, edit the file to remove any blobs you don't wait and the INPUT_THREAD bits at the top. once you have only lines for the sha1s you want to find, run the following script like this:

cat edited-large-files.log | cut -d' ' -f4 | xargs git-find-blob | tee large-file-paths.log

Where the git-find-blob script is below.

#!/usr/bin/perl

# taken from: http://stackoverflow.com/questions/223678/which-commit-has-this-blob
# and modified by Carl Myers <cmyers@cmyers.org> to scan multiple blobs at once
# Also, modified to keep the discovered filenames
# vi: ft=perl

use 5.008;
use strict;
use Memoize;
use Data::Dumper;


my $BLOBS = {};

MAIN: {

    memoize 'check_tree';

    die "usage: git-find-blob <blob1> <blob2> ... -- [<git-log arguments ...>]\n"
        if not @ARGV;


    while ( @ARGV && $ARGV[0] ne '--' ) {
        my $arg = $ARGV[0];
        #print "Processing argument $arg\n";
        open my $rev_parse, '-|', git => 'rev-parse' => '--verify', $arg or die "Couldn't open pipe to git-rev-parse: $!\n";
        my $obj_name = <$rev_parse>;
        close $rev_parse or die "Couldn't expand passed blob.\n";
        chomp $obj_name;
        #$obj_name eq $ARGV[0] or print "($ARGV[0] expands to $obj_name)\n";
        print "($arg expands to $obj_name)\n";
        $BLOBS->{$obj_name} = $arg;
        shift @ARGV;
    }
    shift @ARGV; # drop the -- if present

    #print "BLOBS: " . Dumper($BLOBS) . "\n";

    foreach my $blob ( keys %{$BLOBS} ) {
        #print "Printing results for blob $blob:\n";

        open my $log, '-|', git => log => @ARGV, '--pretty=format:%T %h %s'
            or die "Couldn't open pipe to git-log: $!\n";

        while ( <$log> ) {
            chomp;
            my ( $tree, $commit, $subject ) = split " ", $_, 3;
            #print "Checking tree $tree\n";
            my $results = check_tree( $tree );

            #print "RESULTS: " . Dumper($results);
            if (%{$results}) {
                print "$commit $subject\n";
                foreach my $blob ( keys %{$results} ) {
                    print "\t" . (join ", ", @{$results->{$blob}}) . "\n";
                }
            }
        }
    }

}


sub check_tree {
    my ( $tree ) = @_;
    #print "Calculating hits for tree $tree\n";

    my @subtree;

    # results = { BLOB => [ FILENAME1 ] }
    my $results = {};
    {
        open my $ls_tree, '-|', git => 'ls-tree' => $tree
            or die "Couldn't open pipe to git-ls-tree: $!\n";

        # example git ls-tree output:
        # 100644 blob 15d408e386400ee58e8695417fbe0f858f3ed424    filaname.txt
        while ( <$ls_tree> ) {
            /\A[0-7]{6} (\S+) (\S+)\s+(.*)/
                or die "unexpected git-ls-tree output";
            #print "Scanning line '$_' tree $2 file $3\n";
            foreach my $blob ( keys %{$BLOBS} ) {
                if ( $2 eq $blob ) {
                    print "Found $blob in $tree:$3\n";
                    push @{$results->{$blob}}, $3;
                }
            }
            push @subtree, [$2, $3] if $1 eq 'tree';
        }
    }

    foreach my $st ( @subtree ) {
        # $st->[0] is tree, $st->[1] is dirname
        my $st_result = check_tree( $st->[0] );
        foreach my $blob ( keys %{$st_result} ) {
            foreach my $filename ( @{$st_result->{$blob}} ) {
                my $path = $st->[1] . '/' . $filename;
                #print "Generating subdir path $path\n";
                push @{$results->{$blob}}, $path;
            }
        }
    }

    #print "Returning results for tree $tree: " . Dumper($results) . "\n\n";
    return $results;
}

The output will look like this:

<hash prefix> <oneline log message>
    path/to/file.txt
    path/to/file2.txt
    ...
<hash prefix2> <oneline log msg...>

And so on. Every commit which contains a large file in its tree will be listed. if you grep out the lines that start with a tab, and uniq that, you will have a list of all paths you can filter-branch to remove, or you can do something more complicated.

Let me reiterate: this process ran successfully, on a 10GB repo with 108,000 commits. It took much longer than I predicted when running on a large number of blobs though, over 10 hours, I will have to see if the memorize bit is working...

RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
cmyers
  • 1,110
  • 13
  • 15
  • 1
    Like Aristotle's answer above, this only finds commits _on the current branch_ unless you pass additional arguments: `-- --all`. (Finding all commits repo-wide is important in cases like [thoroughly deleting a large file from the repo history](http://git-scm.com/book/en/Git-Internals-Maintenance-and-Data-Recovery#Removing-Objects)). – peterflynn Jul 11 '13 at 05:26