1

I'm trying to move some files to a separate repository saving the history of their changes and trying to save disk space because the original repository is more than 5 GB but related to the new one takes < 50 MB.

So, I moved all files need to be moved to the new repository into a separate branch, created a new repository. Using the following git commands I was able to preserve history, but the new repository became taking the same disk space as the original one:

git remote add originalreporemote **path**
git fetch originalreporemote 
git merge originalreporemote/branchwithfilestomove --allow-unrelated-histories
git remote rm originalreporemote

Looking at the new repository size I see that it takes the same size as the original one. It's excess to me since there's no scenario in the future when I need to refer to the full history of the original repository in the new one.

The history of files I moved to the new repository should take much less space.

UPD

I understand that it might be difficult to understand the issue, so you can follow the steps to reproduce it:

Let me add some steps that would allow you easily reproduce the issue:

  1. Create two repositories,
  2. Commit a text file to the master branch of repo1 so it will take several KBs,
  3. Create a new branch in repo1,
  4. Checkout back to the master branch of repo1,
  5. Add several large files to repo1 and commit them to the master of repo1. So now repo1 contains 2 branches - the master branch with the large files and the text file and the second branch from step #3 with the text file only,
  6. Try to push the second branch from the step #3 from the first repository to the second repository preserving the history of its changes (commands are above),
  7. I am expecting that after this operation the size of the second repository will be the same KBs as in step #3 but in fact it is the same as the size of the first repository.
  • Commits are the history of a repo and commits are also snapshots of all files in the repo. Is this what you want? – evolutionxbox Apr 14 '19 at 09:57
  • This sounds like an [XY problem](https://meta.stackexchange.com/q/66377/248627). Are you using a Git host that supports large file storage (LFS)? – ChrisGPT was on strike Apr 14 '19 at 12:27
  • If you want to shrink the space taken by the branch, you can duplicate and rewrite it with git filter-branch. The new branch touches only the files you need to move. – ElpieKay Apr 14 '19 at 14:41
  • @Chris No, just typical git repository that contains a lot of resources. I've added an update that could help you to understand the matter better. – MaksimNikicin Apr 15 '19 at 08:53
  • Step 7 should be "push the branch created in step 4 (but not the master branch) to the new repo". – Raymond Chen Apr 15 '19 at 15:09
  • Note that your problem description doesn't match the example. Is the problem that the text files you are keeping have a long history? It's not clear what you want, since you simultaneously ask for a way to preserve history, yet also say that you don't want to copy the history (which is how you preserve it). Maybe you want to preserve the history in the original repo, and link the second repo to it? [git replace](https://www.git-scm.com/book/en/v2/Git-Tools-Replace) may help. – Raymond Chen Apr 15 '19 at 15:20
  • @RaymondChen I want to preserve the history only of files that I'm moving to the new repository. If I move a branch that contains only 1 small text files, I'm expecting that the size of the second repository will be something very similar to the size of the moved text file. In fact it seems it copies the whole history of the original repository because after the commands described above the size of the 2nd repository is the same as the size of the original one. – MaksimNikicin Apr 15 '19 at 18:10
  • In your example, the files you want to preserve never existed in the same branch at the same time as the bulky files, so you can just push the non-bulky branch and the bulky files won't come along for the ride. But in your case, the bulky files are in the history (albeit deleted) so they will be included in the push. You can filter out the bulky files (filter-branch, as noted below) but if there is bulky history in the files you're keeping, that won't help. You can use the "replace" trick I linked to above to link the old history to the small (new) repo. – Raymond Chen Apr 15 '19 at 20:39
  • @RaymondChen the main point is that the large files have never been committed to the branch I'm pushing to another repository. Anyway, I'll try the trick you pointed out. – MaksimNikicin Apr 16 '19 at 06:39
  • If they've never existed in the branch you're pushing, then they shouldn't be included in the push. Perhaps they got included by mistake and immediately deleted? That still counts as having existed. (Or maybe the large content is due to something else, [This answer](https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history) may help you determine the source. – Raymond Chen Apr 16 '19 at 12:21

1 Answers1

0

Your mention of "several large files (mp3 files for instance)" makes me think you should be using Git LFS, though it looks like you aren't doing this today. This would let you keep your full Git history while storing large blobs outside of Git, keeping your repository size down:

Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.

I think this is a cleaner solution, and it's supported by GitHub, GitLab, and Bitbucket's cloud offerings as well as GitHub Enterprise and self-hosted GitLab, or you can set it up yourself.

Briefly, to convert an existing repository:

  1. Install the Git LFS client for your operating system
  2. Enable LFS in your repository with git lfs install
  3. Tell Git which files to store in LFS, e.g. by running git lfs track '*.mp3'
  4. Add the generated .gitattributes file and commit
  5. Remove and re-add your MP3s:

    git rm --cached *.mp3
    git add *.mp3
    git commit -m 'Move MP3s to Git LFS'
    
  6. You'll have to rewrite history using filter-branch as well if you want to "shrink" older commits

This guide is probably worth reading in its entirety.

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
  • The mp3 file was just an example to get the repo to be large quickly. The original scenario wasn't about mp3 files. It was just a repo with a large history. – Raymond Chen Apr 15 '19 at 15:18
  • @RaymondChen, how do you know that? OP specifically mentioned "several large files (mp3 files for instance)" in their update to clarify the question. Whether there are literally MP3 files or some other type of large file doesn't matter. We can only provide answers based on what we're told. – ChrisGPT was on strike Apr 15 '19 at 15:23
  • I guess we will need OP to clarify. I interpreted it as a way to create a large repo quickly rather than an actual statement of the problem. – Raymond Chen Apr 15 '19 at 16:50
  • @RaymondChen is right, it was just an example that you could reproduce it easily locally. In fact this is a repository with large history and a lot of text files. – MaksimNikicin Apr 15 '19 at 18:04
  • @MaksimNikicin, in the future please make an effort provide examples that actually represent your question. The clearer you are the more likely you are to get a helpful answer. See [ask]. – ChrisGPT was on strike Apr 16 '19 at 12:39