2

I've started maintaining a large, unwieldy, repo with a lot of outdated code that could be deprecated or removed. So, I want to find files that have not been changed since a specific commit so that I can go through and check if they're still necessary.

I can find files in my repo that have changed since a specific commit via:

git diff --name-only SHA

But how do I find files that have not been changed?

Community
  • 1
  • 1
thanksd
  • 54,176
  • 22
  • 157
  • 150

2 Answers2

2

The shell utility comm is seriously under-appreciated.

$ git diff --name-only $SHA | LC_ALL=C.UTF-8 sort > /tmp/A
$ git ls-tree -r --full-tree --name-only HEAD | LC_ALL=C.UTF-8 sort > /tmp/B
$ LC_ALL=C.UTF-8 comm -13 /tmp/A /tmp/B

will produce the list you want, by subtracting the set of all changed files from the set of all files. (It's a little finicky, though, hence all the LC_ALL overrides. If you get error messages about C.UTF-8 try just C instead.)

zwol
  • 135,547
  • 38
  • 252
  • 361
0

I can think of two ways of getting the end result you need, though it doesn't specifically answer your question of finding unchanged files.

The quick way would be to get a copy of all the files which have changed between the hash and the head:

cp -p --parents `git diff --name-only $3 $4` /path/to/output

You can then initiate your git repository in that folder and force a commit with those files. That should remove all the old files and leave you with only the changed ones.

Alternatively, do as above and use a directory comparison tool like meld which will let you know which files exist in the repository, but not in the directory of updated files. You can then delete the files from the repository.

gabe3886
  • 4,235
  • 3
  • 27
  • 31