-1

I have a repository which contain huge commits, so I can't check the commits one by one.

I want to find which commit deleted the files, I already try these command but no luck.

1.

git log --all --stat -- "file_path"

This show only my insert commit, but no deleted one.

2.

git log --full-history --all --stat -- "file_path"

This show a lot of commits, and most of them are merge commit, which not even touch the files.

3.

git log --full-history --all --simplify-merges --stat -- "file_path"

This does show a commit which delete the files, but it doesn't, because the commit is just a merge commit.

Any other way that I can try?

Also I have try delete some files and commit it to test if the command is work or not, and it work fine.

So maybe someone use some special way to delete the files?

min
  • 953
  • 1
  • 11
  • 23
  • 2
    "This does show a commit which delete the files, but it doesn't, because the commit is just a merge commit." A merge commit can be the commit that deleted the files. Why not? – matt Oct 20 '22 at 11:21
  • @min : what elements lead you to think that the commit that deletes said file doesn't appear in the output ? can you add the output of one of your `git log` commands ? you may have a clearer output using `--name-status` rather than `--stat` – LeGEC Oct 20 '22 at 11:24
  • @matt yes, you are right, I'm sorry not to explain more, I did check the merge commit, but it show nothing about the deleted files. – min Oct 20 '22 at 11:33
  • @LeGEC I try `--name-status` with 3rd command, now it show the "D" mark in front of the files. – min Oct 20 '22 at 11:38
  • ok, out of curiosity : `git log --graph --oneline --all --name-status -- "file_path"`, does that command show a commit that deletes `file_path` ? are there any renames mentioned in the history ? – LeGEC Oct 20 '22 at 11:45
  • @LeGEC I test the `git log --graph --oneline --all --name-status -- "file_path"`, it only show some A or M commits about the files. – min Oct 20 '22 at 13:08
  • "I did check the merge commit, but it show nothing about the deleted files" Commits don't tell you _anything_ (except their commit message etc). You have to _interrogate_ them if you want to know how they affect the overall history. Typically you do that with `diff` or `show`. I suggest that if you _diff_ this merge commit against its first parent (`git diff ^1`) you will find that this commit _was_ where the deletion happened, just as Git has already told you. – matt Oct 21 '22 at 11:29

1 Answers1

2

You mentioned that:

This does show a commit which delete the files, but it doesn't, because the commit is just a merge commit.

A merge commit that deletes the files ... deletes the files.

Now, looking at the merge commit, you may not "see" a deletion. That's because of the way Git normally shows a commit. Each commit really does hold a snapshot: a full set of all files that will be used from here on, except to the extent that things change in the any later snapshot.

But we don't normally want to see every file. If:

git show a123456

showed us exactly the set of files that git switch --detach a123456 would get us, that wouldn't be all that useful. (If we want to see the set of files in the commit, we just check out that commit, with said git switch or the old git checkout equivalent.)

Instead, for an ordinary—i.e., non-merge, non-root—commit, git show locates not just the commit itself and its files, but also the commit's parent commit, and that commit's snapshot. So if a123456 has parent 987654b, git show a123456 will:

  • extract 987654b to a temporary area (in memory, really, and Git takes shortcuts);
  • extract a123456 to a temporary memory area likewise; and
  • compare the two snapshots.

It then tells us what changed between them, having played a game of Spot the Difference with the two snapshots.

That's great for regular everyday ("ordinary") commits, which have just the one parent. It does not work for:

  • merge commits, which have two or more parents;
  • root commits, which have no parent.

There's a simple fix that Git uses for a root commit: Git just pretends that there is a parent commit whose snapshot is the empty tree, so that the difference between the non-existent parent commit and the very first commit is that every file is added.

But for merge commits, it's not clear what to do. Git could:

  • compare against the first parent;
  • compare against the second parent;
  • compare, one at a time, against each parent and print multiple diffs; or
  • something else.

Both git log and git show choose the last option by default, but each one chooses a different "something else":

  • git log -p chooses not to show anything. That's pretty clearly not so great; you may wish to add flags to make it show something.
  • git show chooses to produce what Git calls a combined diff, by default.

Combined diffs are somewhat useful, especially for merge commits, but they have a couple of deep flaws (that I don't think can be fixed without a redesign here: the new "merge-ort" merge strategy offers the opportunity for this redesign, and that's apparently ongoing now, but it will be some time yet). In particular, the combined diff may not show the deletion of deleted files (if the deletion happened earlier so that one of the parents also omits the files). This leads to a common pattern:

  1. Someone not well versed in Git creates a feature branch.
  2. Someone merges a new feature into the main line, which adds a new file.
  3. The first person merges their feature into the main line and—being unfamiliar with the right way to resolve merge conflicts—winds up deleting the new file in the merge result.

Because the file never existed in the commits they made between steps 1 and 3, the combined diff won't show the file as being "deleted".

Currently, the only way to discover this sort of thing is to do what you did, or to run a process that "retries" each merge to see whether the automated-merge result differs in any way from the committed merge-result. The latter is more useful—it can be fully automated—but until the new merge-ort-related feature goes in, it's painful to actually do it (and you have to write some fairly complicated, messy code).

torek
  • 448,244
  • 59
  • 642
  • 775
  • Yes, I heard of this not intuitive behavior about git merge. So if we have some newbie deleted the files by accident when merge the code, there is no way find out by git? since the files is not exist in the merge commit. – min Oct 20 '22 at 13:14
  • You've found the way to find out: `git log --full-history ...`. The commit that "deleted" the file *is* the merge commit. A merge commit *is a commit*, like any other commit; the only thing special about it is that it has two (or more) parents. – torek Oct 20 '22 at 13:42
  • Ok, let me ask another way, how can we prevent this happen? In normal commit, we can see every files changes in any ui(github, IDE) without any command, then we can stop it in code review. But for merge commit, it's very hard to know unless we use `git log --full-history ...` every time when we do code review to see if any "hidden" change. – min Oct 21 '22 at 06:12
  • Use `git show -m` or `git log -m -p` to have Git show a diff against each parent, one at a time. – torek Oct 21 '22 at 08:22