0

When we do a git clone on our projects it took a while to get cloned. We ran the following command

git rev-list --objects --all | grep "$(git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}')"

It returned bunch of large files that were pushed by mistake and they were deleted. The current master doesn't have those files or the commits. And we don't have older branches that might have contained those files.

How can we remove them from the git history or while cloning.

Thanks

rajkumarts
  • 399
  • 1
  • 7
  • 20
  • You can technically rewrite the git history but that is generally a very bad idea. You should be careful in the future to not commit binaries or data when possible. Short term you can use `--depth 1` while cloning which should only take the most recent version and speed things up considerably. – Marie Mar 23 '18 at 19:39
  • 3
    Possible duplicate of [How to remove/delete a large file from commit history in Git repository?](https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository) – phd Mar 23 '18 at 20:19

1 Answers1

1

Once an item is in the git history, it is there permanently. Even if a later commit deletes the file, the file will still be present in a git clone because a git clone contains the full history (because Git is a distributed version control system). This is what would allow you to retrieve the file by checking out a previous commit.

The only way to remove those files from the repository to speed up the clone is to rewrite the history so that it never included those files in the first place. Github provides detailed instructions on how to do this using git filter-branch here.


Rewriting the history of a repository to remove a large file can be done like this:

$ git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' --prune-empty --tag-name-filter cat -- --all

After doing that, you'll have to force-push to overwrite history on the remote repository. When you do so, all your developers need to be aware that history was re-written and they'll need to checkout fresh copies of the repository to continue their development.

mkasberg
  • 16,022
  • 3
  • 42
  • 46