6

I am trying to Migrate one project from GitLab to GitHub. The repository size is 685.83MB and it consists of few .dat,.csv,.exe,.pkl files which are more than 100MB to 3383.40 MB. it is failing with below errors.

GitLab To GitHub Migration Steps:-
$ git clone --mirror git@your-gitlab-site.com:test/my-repo.git
$ cd my-repo.git
$ git remote set-url --push origin git@github.com:test/my-repo.git
$ git push

Error
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: File Src/project/label/file1.dat is 476.32 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB
remote: error: File Src/models/label/file2.dat is 2431.49 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB
remote: error: File test/test1/label/model/file3.exe is 1031.94 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB
remote: error: File test/test2/usecase/filemarker/file3.csv is 997.02 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB
remote: error: File src/msg/sports/model.pkl is 3383.40 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB
remote: error: File test/movie/maker/marker.dat is 1373.45 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB
remote: error: File project/make/level/project/realmaker.csv is 1594.83 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB
remote: error: File src/moderm/network/test.pkl is 111.07 MB; this exceeds GitHub Enterprise's file size limit of 100.00 MB

Git LFS/BFG  Method:
$ git clone --mirror gitlab-heavy-repo 
$ cd gitlab-heavy-repo.git 
$ java -jar bfg-1.12.5.jar --convert-to-git-lfs '*.dat' --no-blob-protection
$ java -jar bfg-1.12.5.jar --convert-to-git-lfs '*.exe' --no-blob-protection
$ java -jar bfg-1.12.5.jar --convert-to-git-lfs '*.csv' --no-blob-protection
$ java -jar bfg-1.12.5.jar --convert-to-git-lfs '*.pkl' --no-blob-protection
$ git reflog expire --expire=now --all && git gc --prune=now
$ git lfs install
$ git remote set-url origin git@github.com:some-org/githubheavy-repo.git
$ git push 

Even after above process, it fails with same error. it seems Git LFS have 2GB Limitation. So tried to remove the above larger files completely from repository. Followed below method to remove.

1) git clone gitlab-heavy-repo
2) cd gitlab-heavy-repo
3) git filter-branch --force --index-filter "git rm --cached --ignore-unmatch Src/project/label/file1.dat" --prune-empty --tag-name-filter cat -- --all
4) git reflog expire --expire=now --all
5) git gc --prune=now
6) git push origin --force --all
7) git push origin --force --tags
8) rm -rf .git/refs/original/

Repeated the same steps for all the above larger files. But now in Gitlab repository storage size shows - 1.9-GB initially it was only 685.83MB.

Please correct me. Thanks in advance.

user4948798
  • 1,924
  • 4
  • 43
  • 89

1 Answers1

5

Add all files above 100MiB to .gitignore:

find . -size +100M | cat >> .gitignore

If you have not committed the files yet:

Read files from .gitignore and remove them from repo (without deleting them from disk):

On Linux:

git ls-files -ci --exclude-standard -z | xargs -0 git rm --cached

On macOS:

alias apply-gitignore="git ls-files -ci --exclude-standard -z | xargs -0 git rm --cached"

On Windows:

for /F "tokens=*" %a in ('git ls-files -ci --exclude-standard') do @git rm --cached "%a"

If you have committed the files:

You'll need to clean them from commit history. Run the following command to remove a file from all previous commits:

Warning! Rewriting history is dangerous.

On Linux and macOS:

git filter-branch --prune-empty -d ~/tmp/scratch \
  --index-filter "git rm --cached -f --ignore-unmatch PATH/TO/FILE" \
  --tag-name-filter cat -- --all

On Windows:

git filter-branch --prune-empty -d /tmp/scratch \
  --index-filter "git rm --cached -f --ignore-unmatch PATH/TO/FILE" \
  --tag-name-filter cat -- --all

(Replace PATH/TO/FILE with path to the actual file)
Greg explains this command better in his answer here


If you need to run the command above for a folder instead of a file, add an -r switch after git rm in the second line:

... \
  --index-filter "git rm -r --cached -f --ignore-unmatch PATH/TO/FOLDER" \
  ...

git rm can take multiple arguments so you can add multiple paths in the second line:

... \
  --index-filter "git rm -r --cached -f --ignore-unmatch FILE1 FILE2 FOLDER1 FOLDER2" \
  ...
Qumber
  • 13,130
  • 4
  • 18
  • 33
  • 1
    With the above command created `.gitignore` file, But still experiencing same problem. – user4948798 Jun 15 '20 at 16:30
  • Those files may have already been committed. I'm updating my answer. – Qumber Jun 15 '20 at 16:49
  • Okay please. would be great helpful. – user4948798 Jun 15 '20 at 16:51
  • Yes those larger files already in GitLab repo. So from find command found 100MB+ size files and added '.gitignore' file and pushed to GitLab. After that cloned and executed second command. It has removed many files. Is that okay? I can proceed? – user4948798 Jun 15 '20 at 17:42
  • That sounds good. Go ahead and commit the changes and try pushing to Github. – Qumber Jun 15 '20 at 18:00
  • `1) git clone gitlab-heavy-repo 2) cd gitlab-heavy-repo 3) find . -size +100M | cat >> .gitignore 4) git add .gitignore 5) git commit 6) git push 7) git remote set-url origin GitHub 8) git push` Still shows `exceeds GitHub Enterprise's file size limit of 100.00 MB` error. Those files are still there. Actually `git ls-files -ci --exclude-standard -z | xargs -0 git rm --cached` didn't deleted the files it seems. – user4948798 Jun 16 '20 at 02:27
  • my `.gitignore` rule as follows `./Src/project/label/file1.dat ./Src/models/label/file2.dat ./test/test1/label/model/file3.exe ./test/test2/usecase/filemarker/file3.csv` – user4948798 Jun 16 '20 at 02:45
  • They are like 5 files. You could remove them manually. `git rm --cached path/to/file`. Also try the find command with +99M size to ensure no files are causing false positives. – Qumber Jun 16 '20 at 05:26
  • Okay. will it remove from all the branches? – user4948798 Jun 16 '20 at 05:28
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/216028/discussion-between-qumber-rizvi-and-kishore). – Qumber Jun 16 '20 at 05:34
  • PATH/TO/FILE doesn't support filenames with spaces. How would you solve that? – Jesper Hustad Apr 10 '22 at 20:52
  • @JesperHustad, Enclose the path in quotes. – Qumber Apr 10 '22 at 21:16
  • @Qumber the command is already in quotes `"git rm --cached -f --ignore-unmatch PATH/TO/FILE"` so enclosing it in quotes won't work. – Jesper Hustad Apr 10 '22 at 21:27
  • @JesperHustad You can use combination of double and single quotes. Refer to this example - https://unix.stackexchange.com/a/169511 – Qumber Apr 10 '22 at 21:49