As an aside, removing the file outright is the simplest option (though as you see, it's not entirely simple), provided you don't need the file in your repo. Another option is to use a tool like git lfs
to allow your repo to refer to the file without putting the file directly in your repo. This solves many problems associated with large files in git
and should be considered if you really do need the file; but rewriting a repo to use lfs
for a file that's already been committed is another whole topic altogether...
So, back to the issue of removal. To provide a little more context:
In git there are three places a file might be found.
1) Work trees - just the plain files you work on. git
makes no special effort to preserve the data here, and it only exists locally. You can remove files from here by means outside git
, or by using git rm
(especially if you also need to remove them from the index).
2) The index - This is where files are "staged" to make new commits. When you say git add
you update the index. git
will hang onto data here independent of working copies, but still it's only local and no special effort is made to preserve history. git rm
will take a file out of the index.
3) The database - This is where your project history exists. When you say git commit
you add "objects" that represent your project to the database. The database is where git
preserves history and you have to go out of your way to make git
lose any data from here. The database is basically what's shared between repos during push
and fetch
operations. git rm
has no effect on the database.
Now as others have noted, because you have created a commit that includes the file, you need to do more than git rm
. The first step is to rewrite the history(ies) of any ref(s) that contain commits that include the file.
Someone said you need to address the commit that "introduced" the file; that's misleading. You need to dispose of all references to the file (or, technically, to the BLOB
object that represents the file).
Because rebase
interprets commits in terms of their change relative to their parent, it can handle this in a relatively convenient way, if there's not a lot of branching and merging going on after the file was added. If, for example, the file was created in commit A
, and the only ref from which A
is reachable is master
, and there are no merge commits that are newer than A
in master
, then rebase
is the simplest solution. Assuming A
is not the root commit,
git rebase -i A^ master
(where A^
is the SHA of the commit that introduced the file); but if A
is the root commit, meaning A^
isn't valid, then
git rebase -i --root master
In the TODO list that pops up, you change the command for A
to edit, and when given the prompt to edit the commit you remove the file and then tell rebase
to continue.
At this point it's possible that git push
will work, because git
doesn't have to send the whole database; it just sends the history of the ref you tell it to push. But don't be confused: you still haven't removed the file from your database locally. To do that, you have to ensure nothing (even the reflog) can reach the file and then use git gc
. If you've successfully removed the file from all refs' histories, this will happen eventually; which is probably file, unless you're limited on local storage.
There are several important assumptions in the above procedure, and if you just recently committed the file those assumptions may hold. But if multiple branches exist that can reach commit A
, and/or if there are merge commits from which you can reach A
, then doing a rebase
can become much harder. That's when you'd look at git filter-branch
or the BFG Repo Cleaner
as solutions. Of the two, BFG
is far simpler and faster for this task; if you search for it, you can find many sources (including some SO entries) that outline its usage. Because filter-branch
is more general-purpose it's harder to use correctly, but then again it's "built in" - no need to download additional software.
All of these techniques rewrite history. Since you can't push your existing history, that's probably not a big deal (assuming you don't have a second remote to which you've already pushed the changes).