1

I'm a relative git newbie, as you're about to see. So please forgive my poor use of git terminology, I'm still learning.

Concise summary of problem: I want to put my local repo on GitHub, but I have some previously-tracked files that are too big.

Background: This morning I had a local repository where all sorts of files were being tracked: R scripts, .RData files, .csv's, etc. I decided I wanted to make my repository publicly available by pushing it to GitHub.

When I tried to push (using git remote add origin https://github.com/me/repo.git followed by git push -u origin master), I realized that some of my large data files were too large for GitHub. I've decided that it would be OK if the .RData files didn't get pushed to GitHub, and weren't tracked by git (although I don't want to delete the files locally). But I can't figure out how to make this happen.

Things I've tried thus far:

  1. First I added .RData files to the .gitignore file. I quickly realized that this does nothing for files that are already being tracked.
  2. I used git rm -r --cached . followed by git commit -am "Remove ignored files", thinking this would help git forget about all of those huge files I just ignored.
  3. Further following the git help page, I tried git commit --ammend -CHEAD, but I still couldn't push.
  4. I attempted to use the BFG, but I didn't get very far with it b/c it apparently didn't find any files larger than 100M. Clearly I was going something wrong, but decided not to pursue further.
  5. Following some tips I found HERE, I then tried git filter-branch --tree-filter 'git rm -r -f --ignore-unmatch *.RData' HEAD. This definitely did something, but I still couldn't push. However, instead of the huge list of too-big files, I am now down to 2 files that are too big (even though other .RData files in the same directory are no longer listed).

After my last git push -u origin master --force, this is the print out in terminal:

Counting objects: 1163, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (1134/1134), done.
Writing objects: 100% (1163/1163), 473.07 MiB | 6.80 MiB/s, done.
Total 1163 (delta 522), reused 0 (delta 0)
remote: error: GH001: Large files detected.
remote: error: Trace: 4ce4aa642e458a7a715654ac91c56af4
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File Results/bigFile1.RData is 166.51 MB; this exceeds GitHub's file size limit of 100 M
remote: error: File Results/bigFile2.RData is 166.32 MB; this exceeds GitHub's file size limit of 100 MB
To https://github.com/me/repo.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/me/repo.git'

If you haven't guessed, I don't really know what I'm doing ... I'm essentially trying any code snippet I can find, and seeing if it allows me to push. All of my data and files are backed up, so I'm experimenting rather brazenly.

Given that I'm willing to not track the huge .RData files, how do I get my local repo to the point where I can push it to GitHub?

Any help would be very greatly appreciated. Thanks!

rbatt
  • 4,677
  • 4
  • 23
  • 41

1 Answers1

1

I am pretty sure you will just need to remove them from your .git repo history. Not just remove them from the most current version, they need to be excised from ever having existed in your repo.

The technique is covered elsewhere, see this stackoverflow post or the BFG tool.

Community
  • 1
  • 1
lawinslow
  • 961
  • 6
  • 12
  • 1
    OK, so I was on the right track. To be clear, I had to use `git filter-branch -f --index-filter 'git rm -r -f --ignore-unmatch *.RData' HEAD`. So not a `--tree filter` but a `--index-filter`. Also, because I'd already done one of these, I had to add that first `-f` to overwrite a "backup" of sorts. – rbatt Oct 07 '14 at 14:45