4

I have a repo with lots of files that are no longer in the working directory- files that have been added and removed over the months/years of the repository.

I would like to make a file with a list of all these files that are stored in the commit histories but no longer required, including their locations.. i.e.

/web/scripts/index.php
/sql/tables.sql
...

Then I would like a command that runs through that file and removes the files referenced in it from the commit history completely, something like git rm --cached does but for a list of files.

David Cain
  • 16,484
  • 14
  • 65
  • 75
fiscme
  • 422
  • 1
  • 6
  • 20
  • Are you sure you want to do this? You will potentially delete very valuable information. Including the historical versions of files if Git didn't realize you did a rename in that case. – Guvante Aug 12 '13 at 21:38
  • 1
    It will also mess up anyone else that has pulled that repository. It's called rewriting history and is highly frowned upon. – Andrew T Finnell Aug 12 '13 at 21:42
  • I completely agree with both Guvante and Andrew - the only case where you should do this is on a repository that you don't share with others. If you have tested backups and are happy with the changes such a procedure, then go for it. But be warned- you can create a lot of problems by doing this. – David Cain Aug 12 '13 at 22:38
  • Its only me using the repo and i use it for backup mainly. I started with a huge repo and have been condensing and refactoring the code within it. There are thousands of files stored in the history but deleted from the working copy. I know ill never need them again and so i just want to delete them from the repo history as its just taking up unnecessary space – fiscme Aug 13 '13 at 09:32

2 Answers2

3

Short answer

Alias David Underhill's script, then run (with caution):

$ git delete `git log --all --pretty=format: --name-only --diff-filter=D`

Explanation

David Underhill's command uses filter-branch to modify the history of your repository, removing all history of a given file path.

The script, in its entirety (source):

#!/bin/bash
set -o errexit

# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2

if [ $# -eq 0 ]; then
    exit 0
fi

# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi

# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD

# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

Save this script to a location on your hard drive (e.g. /path/to/deletion_script.sh), and make sure it's executable (chmod +x /path/to/deletion_script.sh).

Then alias the command:

$ git config --global alias.delete '!/path/to/deletion_script.sh'

To get a sorted list of all deleted files:

$ git log --all --pretty=format: --name-only --diff-filter=D | sort -u

Bringing it all together

With a list of deleted files, it's just a matter of hooking up git delete to process each file in the list:

$ git delete `git log --all --pretty=format: --name-only --diff-filter=D`

Testing/Example usage

  1. Make a dummy repository with additions, renamings, and deletions:

    mkdir test_repo
    cd test_repo/
    git init
    echo "Dummy content" >> stays.txt
    git add stays.txt && git commit -m "First file, will stay"
    echo "Rename content" >> will_rename.txt
    git add will_rename.txt && git commit -m "Going to rename"
    echo "Delete this file" >> will_delete.txt
    git add will_delete.txt && git commit -m "Delete this file"
    git mv will_rename.txt renamed.txt && git commit -m "File renamed"
    git rm will_delete.txt && git commit -m "File deleted"
    
  2. Inspect the history:

    $ git whatchanged --oneline
    d768c58 File deleted
    :100644 000000 7a4187c... 0000000... D  will_delete.txt
    96aadf0 File renamed
    :000000 100644 0000000... 94a12c7... A  renamed.txt
    :100644 000000 94a12c7... 0000000... D  will_rename.txt
    3ba05fa Delete this file
    :000000 100644 0000000... 7a4187c... A  will_delete.txt
    c88850a Going to rename
    :000000 100644 0000000... 94a12c7... A  will_rename.txt
    6db6015 First file, will stay
    :000000 100644 0000000... f3ae800... A  stays.txt
    
  3. Delete old files:

    $ git delete `git log --all --pretty=format: --name-only --diff-filter=D`
    Rewrite 8c2009db5ac05b27cd065482da94dec717f5ef4a (8/9)rm 'will_delete.txt'
    Rewrite e1348d588597f2f6dd63cade081e0fbdf8692c74 (9/9)
    Ref 'refs/heads/master' was rewritten
    Counting objects: 27, done.
    Delta compression using up to 4 threads.
    Compressing objects: 100% (22/22), done.
    Writing objects: 100% (27/27), done.
    Total 27 (delta 12), reused 10 (delta 0)
    
  4. Inspect the repository now. Notice that the deletions have been removed from the history, and renamings appear as if the file was added initially that way.

    c800020 File renamed
    :000000 100644 0000000... 94a12c7... A  renamed.txt
    0a729d7 First file, will stay
    :000000 100644 0000000... f3ae800... A  stays.txt
    
Community
  • 1
  • 1
David Cain
  • 16,484
  • 14
  • 65
  • 75
  • I have been through it and ran `git delete git log --all --pretty=format: --name-only --diff-filter=D` but got `Expansion of alias 'delete' failed; 'sh' is not a git command `. What am i doing wrong? – fiscme Aug 13 '13 at 08:44
  • What operating system are you on? If not a *NIX system, how are you running Git? (The script I supply above is a bash script- if you're running Cygwin, you could make this work, but it uses GNU utilities you won't find on a Windows system). – David Cain Aug 13 '13 at 14:58
  • Also, I'm assuming you just didn't get the Markdown syntax right, but the proper command is ``git delete `git log --all --pretty=format: --name-only --diff-filter=D` `` (the backtick expansion is key). – David Cain Aug 13 '13 at 15:04
  • using Ubuntu 12.04, i did get the command correct and i do have sh installed but it still wont work.. – fiscme Aug 14 '13 at 14:10
  • I'm sorry, this is completely my fault - I got the syntax wrong for Git aliases. You need to place a `!` before shell commands. I updated my answer (you'll want to make the script executable, and change the `alias` command). – David Cain Aug 14 '13 at 15:47
  • @fiscme, are you still having issues? I believe the `!` fix described above should work for you. – David Cain Aug 16 '13 at 19:45
  • have been away.. ill try today and get back to you. Thanks – fiscme Aug 19 '13 at 08:47
  • There's one major caveat with this: if in your git history, you have deleted a file, then created a new one of the same name that is currently tracked in your repo, this will also delete the current one! You may want to iterate through the output of `git ls-files` and verify that none of the files in `$files` is there. – Luke Davis Oct 31 '18 at 06:32
  • One more thing -- I noticed that the files will appear when you run `git log --all --pretty=format: --name-only --diff-filter=D`. Does this mean we've removed the **underlying file data**, but the commit history for these files remains? – Luke Davis Oct 31 '18 at 07:13
0

Adding onto @David's answer, if you want to be extra careful and make sure you aren't deleting any files that were subsequently added later on in the history, use the following block of commands instead of the git delete $(git log --all --pretty=format: --name-only --diff-filter=D) (consider adding this as a function in your .bashrc):

current=($(git ls-files))
tracked=($(git log --all --pretty=format: --name-only --diff-filter=D | xargs))
deleted=()
resurrected=()
for file in "${tracked[@]}"; do
if [[ " ${current[@]} " =~ " $file " ]]; then
  resurrected+=("$file")
else
  deleted+=("$file");
fi
done
echo "Deleted: ${deleted[@]}"
echo "Resurrected: ${resurrected[@]}"
git delete "${deleted[@]}"
Luke Davis
  • 2,548
  • 2
  • 21
  • 43