0

Earlier this month I was promoted to a new position and inherited a website. I loaded the site into a git repository and started working, but about 50 commits later I realized that there were dozens of .txt files scattered throughout the site containing user names and email addresses. Because I've already deployed a complete re-write of the site, I'm not worried about any collateral damage and want to delete every .txt file in every commit of the repository. I know individual files can be removed with git filter-branch, but my attempt to scale it didn't seem to do anything.

git filter-branch --force --index-filter \           
"find . -type f | grep .txt | xargs -I {} git rm --cached --ignore-unmatch {}" \
--prune-empty --tag-name-filter cat -- --all

What is the best way to delete every .txt file in the history of a git repository? Can it be done without having to rewrite the entire history for each file?

Aayla
  • 111
  • 5
  • Were all these .txt files added in one single commit? – KamilCuk Sep 28 '20 at 18:30
  • 2
    Does this answer your question? [Git - Remove All of a Certain Type of File from the Repository](https://stackoverflow.com/questions/38880436/git-remove-all-of-a-certain-type-of-file-from-the-repository) – flaxel Sep 28 '20 at 18:51

2 Answers2

3

Replace the entire filter with

'git rm --cached --ignore-unmatch \*.txt'

with the single quotes and backslash.

git filter-branch -f --index-filter '
        git rm --cached --ignore-unmatched \*.txt
' --prune-empty --tag-name-filter cat -- --all

The single quotes get the contained filter text passed through with no shell processing at all when you issue the command, so the shell that filter-branch is running sees git rm --cached --ignore-unmatch \*.txt and its escape processing gets it to pass the wildcard through unmolested to git rm. Git understands globs, so that removes all the .txt files from the index.

There's other ways to do it, the thing to stay mindful of is you're issuing a shell command, constructing the args that command will see, and filter-branch, the command you're issuing, is issuing what you give it as a command, in its own shell. There's syntax and commands that let you control which shells perform which expansions, here I'm using single quotes and embedded escapes as the quickest-to-type.

jthill
  • 55,082
  • 5
  • 77
  • 137
  • Does this also walk the directories or does it just remove the .txt files wherever I run it? – Aayla Sep 28 '20 at 19:10
  • 1
    The index aka cache is an index, a flat list. This removes all `*.txt` files anywhere. You could have tried it yourself with the `-n` option to see what it affects. – jthill Sep 28 '20 at 19:16
0

git filter-repo is now the officially recommended tool for editing history. git filter-branch and BGF repo cleaner are now deprecated.

git filter-repo is very easy to use, well documented, and much faster than git filter-branch. Here's how to remove all *.txt files in the repository's history:

  1. install git filter repo (it's just a single python file you need to put in your PATH)
  2. get a clean clone of your repository (make sure you keep a backup in case something goes wrong)
  3. execute the following command:
git-filter-repo --invert-paths --path-glob "*.txt"

The tool also lets you edit any file you want in the repository's history, fairly easily. It supports Python callbacks so you can run arbitrary Python code on any file. For example, you could replace "foo@example.com" with "[snipped email]" across all history, see this answer.

MiniQuark
  • 46,633
  • 36
  • 147
  • 183