2

A tool gives me a list of git file revisions, apparently blob object SHA names, which are the states of those files in some repository revision. There can also be a set of files in a did-not-exist state. Unfortunately, the tool doesn't give me what that commit ID was. (It sometimes does, but the string could also be a branch name which at some point in the past referred to that revision. I fear this suggests it might be something even less helpful like HEAD~5.)

I would like to script something to determine a commit ID which "contains" those file revisions, in the sense that if I did git restore $commit, I would have the contents of the given file revisions in the working tree, and the did-not-exist files would be deleted from the working tree. When multiple commit IDs satisfy that condition, it would be more user-friendly, but not always required, to:

  1. If I have a specific branch name, prefer a matching revision in that branch's direct history if possible.
  2. Prefer the latest matching revision.

I know git describe $file_rev will give me a string repo_rev:filename, so if I cut at the first :, I think that gives the earliest repo revision which contains that file revision. (It might or might not end in -g followed by a commit SHA-1 prefix, but since the longer string is always a valid tree-ish name, that doesn't seem worth looking at.) A matching commit must have each of these as an ancestor or be the same revision, but without further changing any of the listed filenames.

In the case with a branch name, I could step through the git rev-list --first-parent $branch until I find one that matches. To test a repo revision, I can check if git rev-parse $repo_rev:$filename and git rev-parse $file-rev match. The former should exit with an error code for a did-not-exist file.

Without a known branch name, or if nothing in the branch linear history matches, checking all revisions in that way could work, but sounds wasteful. Specifying ^$repo_rev~ for each revision found by cutting git describe $file_rev at the colon should help cut older revisions out of the git rev-list. Specifying -- $files could help find revisions which changed a relevant file FROM the desired version, and I could inspect those revisions' ancestors.

Maybe there are some plumbing commands, other rev-list options, or similar which could make all or parts of this much easier? If I keep working out just the ideas I put above, are strange merge histories going to break the methods? I'd like to treat a file rename or copy as a change based on just working directory contents, but is there a chance a file rename or copy could fool the script into missing a change?

aschepler
  • 70,891
  • 9
  • 107
  • 161
  • Can you give an example of output from your script ? how those "file revision" are displayed ? – LeGEC May 19 '22 at 10:31
  • 1
    If you want to spot commits that contain a specific blob, `git log` has a [`--find-object`](https://git-scm.com/docs/git-log#Documentation/git-log.txt---find-objectltobject-idgt) option. – LeGEC May 19 '22 at 10:32
  • @LeGEC The "file revisions" are always hexadecimal strings, and if I pass them to `git describe` I always get a string ending with `:` then the expected filename. I think you're right that these are blob SHA names. – aschepler May 19 '22 at 10:48
  • ok, you can also try `git cat-file -t ` to see what kind of object it is (`commit`, `tree`, `blob` or `tag`) and `git show ` or `git cat-file -p ` to view its content – LeGEC May 19 '22 at 10:51
  • @LeGEC Yup, `git cat-file -t ` prints `blob`. – aschepler May 19 '22 at 10:52
  • I can re-open this as it's not *exactly* a duplicate, but the scripts in the linked question are going to be good starting points. – torek May 19 '22 at 18:29

2 Answers2

1

If you want to spot the revisions where a specific blob appeared or disappeared, you may use git log --find-object :

git log --find-object=<blobsha>

# also works with other ways to target a blob:
git log --find-object=HEAD:path/to/that/file

# combines with all options for git log:
# 'name-status' or 'name-only' will only print paths for files
# that match that content
git log --name-status --format="%H" --find-object=<blobsha> --all

Once you have spotted a commit, you can use git branch --contains or git for-each-ref --contains to identify what branches it is part of.

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • So usually on a single branch this will give two commits: one where the blob was added, and one where it was removed because a newer version of the file was committed. Then `commit1~...commit2~` is the commits which have the wanted file version. But in general, there could be a bunch of commits which added the version and a bunch which replaced it, and it's not clear how to get a set of commits "containing" the file version. I think the one I actually want would be a parent of one which replaced/removed the blob... – aschepler May 19 '22 at 15:33
  • @aschepler : you are right, it is a bit convoluted to understand what commit holds what version of the file. If you want to script your way out of this, you can use the `--name-only` trick to at least get the path of target file to inspect within commit ``, and check whether the hash you expect matches `git rev-parse :path` or `git rev-parse ~:path`. – LeGEC May 19 '22 at 16:43
0

The git file revision is the sha1 value of a blob object. One blob could exist in multiple commits, and could exist as multiple files in one commit.

Find all commits which are reachable from at least one of all refs.

git rev-list --all

Search the commit's tree contents for the blob.

git ls-tree -r ${commit} | grep ${blobsha1}

Combined, it could be like

git rev-list --all | while read commit;do
    s=$(git ls-tree -r ${commit} | grep ${blobsha1})
    if [[ "${s}" != "" ]];then
        echo $commit $s
    fi
done

After finding a commit, we can find which refs it is reachable from.

git for-each-ref --contains=${commit}

It could be time-consuming, depending on the number of commits and the number of files tracked by the commits. One of the known issues is that it can not search the submodule. If there is any submodule, you can list the commits of the submodule first. I hope there are better methods.

ElpieKay
  • 27,194
  • 6
  • 32
  • 53