Is it possible to get a list of all git object hashes of blobs which have been added to the repository by a given commit hash using the git command line tools?
I already tried archiving this with the git plumbing tool git-diff-tree
. Maybe it's the wrong approach. Below is the best result I could get so far. But the (very long man page) documentation didn't help finding out how exactly the output has to be interpreted.
$ git diff-tree --no-commit-id 2b53d04dbb7cd35d030ddc59b13c0836a87daeb7
:100644 100644 03f15b592c7d776da37e3d4372c215b14ff8820f 6e0ed0b1ed56e9a35a3be52a9de261c8ffcccae8 M file1.ts
:100644 100644 b5083bdb9c31005ebd16835a0f49dc848d3f387a 4b7f9e6624a66fec0510d76823303017e224c9d7 M file2.ts
:100644 100644 368d64862e6aa2a0110f201c8a5193d929e2956d 0e51626a9866a8a3896489f497fbd745a5f4a9f2 M file3.ts
:040000 040000 c332b1e576af0dbb93cc875106bc06c3de6b74c8 f7f3478a9b0eaac85719699d97e323563a1b102b M some_folder
Do the first and second git object blob hashes show the old and new objects for the modified file respectively? In the worst case I could fetch that information by parsing the output.
My primary goal was to find a command line which works as below:
$ git <command> <option1> <option2> 368d64862e6aa2a0110f201c8a5193d929e2956d
6e0ed0b1ed56e9a35a3be52a9de261c8ffcccae8
4b7f9e6624a66fec0510d76823303017e224c9d7
0e51626a9866a8a3896489f497fbd745a5f4a9f2
Edit below in response to @torek
In response to the answer of @torek I want to be more clear about what my intentions are because he is absolutely right pointing out that new isn't nececessary new.
I am planning to use git rev-list --reverse <branch>
to get a a list of all commits on that branch in commit order. Then I want to visit every commit in this order and collect firstly seen blob hashes on this branch per commit.
The end result should be something like the following:
C:368d64862e6aa2a0110f201c8a5193d929e2956d
B:03f15b592c7d776da37e3d4372c215b14ff8820f
B:4b7f9e6624a66fec0510d76823303017e224c9d7
B:c332b1e576af0dbb93cc875106bc06c3de6b74c8
C:5521a02ce1bc4f147d0fa39a178512476764dd66
B:e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e
B:adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
C:a3db5c13ff90a36963278c6a39e4ee3c22e2a436
B:4888920a568af4ef2d2f4866e75b4061112a39ea
.
.
.
C:
commit
B:
blob
If this isn't easily done it would be ok to do two passes. In the first pass blobs can be mentioned multipe times in different commits because of reasons you have pointed out:
- adding a file with the same content in an other file
- a file has the same content after it has been modified
I could then do a second pass piping the file through awk '!x[$0]++'
which will remove any duplicates. This wouldn't be very efficient but would get the result I want.
I hope I made my intentions clear now. Any thoughts?