29

I'm starting a project using git where I'll be committing very large files, but only a few times a week. I've tried to use git as-is and it seems to store the entire file in each commit where it is changed. This will not work for this project, the repository would grow out of control. So, I want to reduce the size of the repository.

My first thought was to "simply" remove all commits older than say two weeks, or only keep e.g. five commits in the history (this is probably better :)) I've googled and read a lot from The Git Community Book and I guess I'm gonna need to work with git-rebase or git-filter-branch. The thing is I just can't seem to get it to work.

Just to illustrate; I have a history H with only one branch (The master branch)

A --> B --> C --> D --> E

I want to remove some previous commits to make my history look like

C --> D --> E

Commits A and B should be completely purged. I've tried git-rebase but it seems to merge commits together rather than actually removing old ones, maybe I don't fully understand how rebase works.. Another thought I had was to remove everything from .git/objects and then build a new commit using git-hash-object -w, git-mktree and git-commit-tree, I have not yet managed to push this "artificial" tree to the server though.

I won't be working with any branches, so there's no need taking these into account.

What I'm wondering is if anyone can give me concrete usages of git-rebase if that's what I'm supposed to use? Or some other tips, examples of what I can do.

Cheers!


Edit:

The large files will not be the same large files all the time, and some files will be replaced by new files. I want these replaced files to be completely purged from the history.

Thinner
  • 291
  • 1
  • 3
  • 4
  • Do the later commits (C onwards) still contain the big files, or are they already removed (`git rm`) before? – Paŭlo Ebermann Feb 13 '11 at 23:38
  • Your use of the word `merge` to describe `git-rebase` makes me think you don't really understand `git-rebase`. `git-rebase` will move commits and apply them on top of a different head or, if you pass `-i`, let you rewrite and "squash" commits, change order, execute commands, edit commit messages, etc. No merging is involved. – alternative Feb 14 '11 at 00:26
  • 4
    Is git the right tool for this job? If you only want to keep the last few snapshots, then doesn't that defeat the point? Why not just use a flat file store? – Robie Basak Feb 14 '11 at 00:29

1 Answers1

14

This should be a simple git rebase -i where you have

p A
s B
s C
p D
p E

and then edit the commit message for A-C to be just C's commit message.

git-rebase will "squash" all the commits into a single commit, who's objects are the same as commit C's objects.

Note: It may be possible to use git filter-branch to change the big files in the previous commits to actually match the new ones, if you'd rather do that. But its a dangerous operation and I don't want to give you a bad command on accident.

alternative
  • 12,703
  • 5
  • 41
  • 41
  • Wouldn't that make the history look like `A -- C' -- D -- E` where C' still contains the files? – Thinner Feb 13 '11 at 23:22
  • 2
    No, where do you get the idea of that? Basically, commit A _becomes_ commit C once you squash the other two on top of that. You might have to do a `git gc` to clear out the objects though. – alternative Feb 14 '11 at 00:19
  • Yeah, but if A and B becomes C and A contains a large file that I want to delete - won't C contain A's file? Say I add a large file in commit C and remove it in commit D, will squasing C into B remove the large file from the history? I'm sorry if I'm a bit confused, rebasing messes with my brain :) Thanks a lot for taking the time though. – Thinner Feb 14 '11 at 00:36
  • 2
    Commits A and B disappear. That is all. Commit C does not change. – alternative Feb 14 '11 at 23:41
  • Looking back on this after receiving a downvote - you might need to prune the database for your own local copy to get rid of the large blob, as well as the remote copy, but pushing will not push the large file. – alternative Apr 06 '13 at 12:58
  • Commit A certainly does not disappear. Squashing doesn't remove anything from the history other than the related commit messages, not the content of the commits. – Chris Rasys Jul 17 '13 at 13:30
  • @Chris The old commit object and blobs are no longer necessary and should be removed at the next prune. The commit message is part of the commit, and the old tree is no longer linked to - it is stray. – alternative Jul 17 '13 at 14:47
  • Yes, I know all that, I meant that the original content of commit A will still persist, as opposed to being deleted/destroyed – Chris Rasys Jul 17 '13 at 17:23
  • Typing 'git rebase -i' just brings up a text editor showing "noop". I'd really like to clean up my commit history before pushing a repo to a public remote, but I can't find instructions I understand anywhere. – Scott Aug 13 '13 at 19:39