1

I have added file >100 Mb to my git repository and it worked in local git. Unfortunately, github has 100 Mb limitation.

So, I rewrote code so that it doesn't need this big file, then removed it, then commited.

Unfortunately, I still can't push to gothub, because file is still in the repository.

I tried to run

git rm --cached my_file.dat

git rm --cached -r my_file.dat

git rm --cached mypath/my_file.dat

and all commands failed with

fatal: pathspec ... did not match any files

How to remove file w/o specifying exact path to it?

UPDATE

I tried to run

java -jar bfg.jar --strip-blobs-bigger-than 100M

and it failed with message

Scanning packfile for large blobs completed in 2 ms.
Warning : no large blobs matching criteria found in packfiles - does the repo need to be packed?

but still unable to

git push origin master

with

File my_path/my_file.dat is 257.62 MB; this exceeds GitHub's file size limit of 100.00 MB
Dims
  • 47,675
  • 117
  • 331
  • 600

3 Answers3

1

You need to somehow remove this file from all commits.

Several ways to do this are :

  • if you have a reasonably low number of commits to edit :
    use git rebase -i to manually edit commits
  • if you have to do it on a large scale (many commits, several branches) : use git filter-branch --index-filter
    or the bfg-repo-cleaner suggested by @Sirko

How to use git rebase -i :

if your history looks like this :

      big file added here
        v
--*--A--B--C--D--E--F <- master

to rework the content of B, you will need to rebase from its parent :

git rebase -i A

This will open a text editor, which will ask what action you want to take on each single commit from B to F

It will start with :

 pick  B   message
 pick  C   message
 pick  D   message
 ...

You want to change B, to remove the big file from this commit

# set the action on b to 'edit' (or e) :
e B  message
pick  C   message
pick  D   message
...

save and close.

Now git will apply the actions you told him to :

  • he will rewind your repo up to A
  • you told git to edit B : he will apply B, and then stop so that you can do whatever you want
  • to remove the big file from this commit :

    git rm --cached big/file
    git commit --amend
    
  • now you want to tell git to resume with the rebasing :

    git rebase --continue
    
  • you should see messages indicating that git is replaying C then D .. up to F

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • `git rebase -i` opens text editor with `.git/rebase-merge/git-rebase-todo` file opened – Dims Oct 17 '17 at 12:16
  • I shows 2 commints (why 2? I have dozens) with `pick` word in the beginning of each line; if I replace one `pick` with `drop` and save the file, it shows `could not apply ..., when you have resolved this problem` – Dims Oct 17 '17 at 12:22
  • Did you add the big file on one specific commit ? – LeGEC Oct 17 '17 at 13:44
  • I have few commits but filter-branch is still a straightforward solution: git filter-branch --tree-filter 'rm -f ' HEAD as recommended here https://buildvirtual.net/how-to-remove-or-delete-a-file-from-git/ – Rony Armon Jul 27 '23 at 08:06
1

As an aside, removing the file outright is the simplest option (though as you see, it's not entirely simple), provided you don't need the file in your repo. Another option is to use a tool like git lfs to allow your repo to refer to the file without putting the file directly in your repo. This solves many problems associated with large files in git and should be considered if you really do need the file; but rewriting a repo to use lfs for a file that's already been committed is another whole topic altogether...

So, back to the issue of removal. To provide a little more context:

In git there are three places a file might be found.

1) Work trees - just the plain files you work on. git makes no special effort to preserve the data here, and it only exists locally. You can remove files from here by means outside git, or by using git rm (especially if you also need to remove them from the index).

2) The index - This is where files are "staged" to make new commits. When you say git add you update the index. git will hang onto data here independent of working copies, but still it's only local and no special effort is made to preserve history. git rm will take a file out of the index.

3) The database - This is where your project history exists. When you say git commit you add "objects" that represent your project to the database. The database is where git preserves history and you have to go out of your way to make git lose any data from here. The database is basically what's shared between repos during push and fetch operations. git rm has no effect on the database.

Now as others have noted, because you have created a commit that includes the file, you need to do more than git rm. The first step is to rewrite the history(ies) of any ref(s) that contain commits that include the file.

Someone said you need to address the commit that "introduced" the file; that's misleading. You need to dispose of all references to the file (or, technically, to the BLOB object that represents the file).

Because rebase interprets commits in terms of their change relative to their parent, it can handle this in a relatively convenient way, if there's not a lot of branching and merging going on after the file was added. If, for example, the file was created in commit A, and the only ref from which A is reachable is master, and there are no merge commits that are newer than A in master, then rebase is the simplest solution. Assuming A is not the root commit,

git rebase -i A^ master

(where A^ is the SHA of the commit that introduced the file); but if A is the root commit, meaning A^ isn't valid, then

git rebase -i --root master

In the TODO list that pops up, you change the command for A to edit, and when given the prompt to edit the commit you remove the file and then tell rebase to continue.

At this point it's possible that git push will work, because git doesn't have to send the whole database; it just sends the history of the ref you tell it to push. But don't be confused: you still haven't removed the file from your database locally. To do that, you have to ensure nothing (even the reflog) can reach the file and then use git gc. If you've successfully removed the file from all refs' histories, this will happen eventually; which is probably file, unless you're limited on local storage.

There are several important assumptions in the above procedure, and if you just recently committed the file those assumptions may hold. But if multiple branches exist that can reach commit A, and/or if there are merge commits from which you can reach A, then doing a rebase can become much harder. That's when you'd look at git filter-branch or the BFG Repo Cleaner as solutions. Of the two, BFG is far simpler and faster for this task; if you search for it, you can find many sources (including some SO entries) that outline its usage. Because filter-branch is more general-purpose it's harder to use correctly, but then again it's "built in" - no need to download additional software.

All of these techniques rewrite history. Since you can't push your existing history, that's probably not a big deal (assuming you don't have a second remote to which you've already pushed the changes).

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
0

The file is still in the repository's history... You need to remove the commit(s) that introduced it...

If you can cleanly identify the commit that introduced it, then try the following:

git rebase -i ${COMMIT_ID}^

This will present you with a list of commits, where you can choose to edit or drop certain items. Either mark the offending commit by replacing the default pick with drop to simply drop it (and all other changes that the commit makes!), or mark the offending commit with edit, remove the file, re-commit and contiue.

Once you have done this, try pushing again.


Git isn't really designed for large binary files, so avoid checking them in. If you "need" to, then it might be worth checking out the Git Large File Storage project.

Attie
  • 6,690
  • 2
  • 24
  • 34
  • What is `${COMMIT_ID}^`? – Dims Oct 17 '17 at 12:14
  • When you do `git log`, you'll see the commit IDs listed along with the commit message and other information. You should replace `${COMMIT_ID}` with this identifier, and then follow that with a hat (`^`), which indicates "_the parent of_". For example `git rebase -i f928b95^`. – Attie Oct 17 '17 at 13:45