TL;DR: do an interactive rebase and replace your bad commits with better ones, or use the BFG (see How to remove/delete a large file from commit history in Git repository?).
Git pushes commits, not files
In Git, every commit is permanent and unchangeable. Moreover, the commits are the history: your latest commit points back to your second-latest commit, which points back to your third-latest, and so on, all the way back to the very first commit.
Now suppose you have committed a large file (such as a DVD image, 4.7 GB or so). Later, you delete the file and commit again.
When you go to git push
the resulting commit, Git will—must—push not only the new commit, that deletes the file, but also the older commit that creates the file.
If Git failed to do this, you would not be able to recall the commit that contains the big file. The whole point of Git is to be able to recall every commit ever, so this would bethe opposite of version control. If Git only sent your latest, that would be uncontrolled unversion.
The files are a side effect of the commits. Git is all about commits. Files are just sort of an accidental bonus. Of course, the files are the purpose of the commits in the first place, but Git is still about commits.
What this means to you
Your big files are somewhere in the commits that you have, that they don't:
localhost:myproject nataliab$ git status
On branch master
Your branch is ahead of 'origin/master' by 62 commits.
(use "git push" to publish your local commits)
nothing to commit, working directory clean
Somewhere in these 62 (probably1) commits, you added some big files. Somewhere later, you presumably deleted them—but Git has to push all the commits.
Moreover, commits are permanent and unchangeable. You cannot change the older commits that add the file. This leaves only one possible solution: don't push these commits at all.
You might—and should, really—object. Presumably you do want to push (at least some of) these commits. But what I am telling you is that you don't want to push these commits. You want to push, instead, some slightly altered, better commits.
1"Probably", because origin/master
is your Git's memory of what is under the name master
on the other Git repository over at origin
. This memory is not always up to date. You can run git fetch origin
to pick up the latest commits from them, and thus have your Git update its memory. But if you are the only one using the other repository, your Git's memory will be accurate enough.
Copy the "bad" commits to new, different, "better" commits
Use git log
to view the commits you are currently pushing:
$ git log --name-status origin/master..master
The --name-status
argument tells Git to compare each commit to the previous commit (as usual), but then instead of showing a full git diff
, just show which files were added, modified, and deleted.
You will have one commit that deletes some big file(s), and then an earlier commit that adds those same big file(s). Your job now is to correct the earlier commit, so that it does not add those files at all.
You can't actually change that earlier commit! But you can copy it, to a commit that is very similar: make a commit that is almost exactly the same, except that it doesn't add the big file(s). The new commit you make will have the same parent ID—this is how Git keeps track of which commits go before which other commits. It will have the same author (you), the same committer (you), the same log message, perhaps even the same date ... but it won't have the big file(s).
As a side effect of copying this particular bad commit to a new, better commit, you will be forced to copy every subsequent commit. The reason is that every commit records its previous (parent) commit ID, and your new-and-improved copy commit will have a different parent. So now you need to copy its child. The new "child copy" is the same as the previous child, except for two things: the parent ID, and the fact that the big file is gone.
This repeats for every commit up to the one that deletes the big file(s). Now, if that particular commit just deletes the big files, you can just discard that commit at this point: every copy you've been making so far lacks those files anyway, so there will be nothing to do. If that commit does something besides just deleting the big files, though, you will presumably want to keep the other parts of it.
After that point, you probably just want to copy each remaining commit, changing only its parent ID.
There are two Git commands that do this kind of commit copying: git filter-branch
and git rebase -i
. The former is somewhat difficult to use, so if you are going to stick with things that come with Git, I generally recommend using rebase
, unless you have merge commits in those commits you need to copy (any such merges will show up in the git log
output).
The instructions for using both filter-branch
and rebase -i
are in Greg Bacon's linked answer to the above-linked question.
Although I have never used BFG, its operation is reported to be much simpler. It does not do nearly as many things as filter-branch
and interactive rebase, so it does not have such complicated controls. It still copies the commits, though.
Once the commits are all copied, you simply "forget" the bad ones
The way Git's branches work is that the name, master
, simply points to the latest commit on branch master
. Each commit points to its earlier counterpart. So once you have copied the "bad" commits to a "better" ones, your master
will point to the newest copied commit. That commit points back to its parent, and so on, for however many—maybe 61, now—commits that it takes to get to where origin/master
points.
The other Git repository, over on origin
, already has that commit and every earlier commit. But now you can git push origin master
, and your Git will call up their Git, find the commits to push, and start pushing—and the ones to push will be the new, better copies, not the originals.
(What happens to the originals? Eventually, they age out and get expired and deleted. If you want them back, you have at least 30 days to get them back.)