How to create a baseline in Git that only include files from specific commits

Question

Within ClearCase, we had the ability to create a labeled baseline which contained only the binary files modifed by specific change requests managed by ClearQuest. For reference, we are using Jira to do the same thing, with a custom integration. The question is how to create a Git branch with ONLY those versions of files referenced by a specific list of commit hashes and then archive only those files to a baseline .zip file.

Our company is migrating from ClearCase to Git. The code we deal with is NOT traditional text-based source. It is contained in binary objects created by proprietary vendor software, usually a .zip archive of other binary files. We store the binary objects as Git LFS objects. Since we are dealing with binary files, no merging is performed. We use LFS locks to guarantee single access to file and preventing the need to merge. Our programmers also do not work with command-line interfaces, such as bash. To them, dealing with an update of a "file" is check out, change, check in. The Git paradigm of lock, pull, edit, add, commit, unlock is beyond the scope of their capabilities. To maintain the ClearCase check out/check in paradigm, a number of wrapper scripts were created which hide the Git details from the programmer. These scripts are called from Windows Explorer context menu commands.

Having said all of that, PLEASE do not respond that we are misusing Git. It is a complicated tool that is not suitable for all programming environments. Git is, however, the most popular version control tool available at this time. And we have been able to preserve all of the ClearCase interface functionality that our programmers are familiar with.

The programmers do not use branches. We maintain a master branch only. Every wrapper command pulls and pushes to the origin master branch. Every file that is added, removed, or checked in, is done so under a separate commit.

The following steps are not performed by the typical programmer, only by the Git administrator.

Currently, we:

create and checkout a branch
create a list of hashes for the specific commits
perform a git rm -rf . in the branch to get rid of everything
tried performing each of the following to populate just the files/commits we want
- a. checkout of each hash
- b. lfs checkout of each hash
- c. create a tag with the version string (e.g. v1.0)
push the tag to origin
checkout -f master to to get back to original condition
perform git archive with version string to a zip file

Regardless of what we do, the git archive wants to include everything in the repo at the time of the last commit into the archive. This is the reason for step 3.

We tried a step 6.1 of just zipping the folder containing the project. All we get are the Git lfs ref files in the .zip which is the reason for trying steps 4 - a,b,c.

It seems that we are performing a lot of steps to accomplish something simple. It is possible that we are missing the obvious? Can anyone suggest a more straight-forward approach? Recall that the final .zip result must only contain the files modified by the specific commits.

are "specific commits" always a *range* of commits ? or is it a list of individual commits not necessarily following one another ? — LeGEC, Sep 10 '19 at 14:38
The specific commits are individual commits, not in a range. — gitrdone, Sep 10 '19 at 19:29

VonC · Answer 1 · 2019-09-13T11:32:54.947

Our programmers also do not work with command-line interfaces, such as bash. To them, dealing with an update of a "file" is check out, change, check in. The Git paradigm of lock, pull, edit, add, commit, unlock is beyond the scope of their capabilities. To maintain the ClearCase check out/check in paradigm, a number of wrapper scripts were created which hide the Git details from the programmer.

You are really misusin...

Having said all of that, PLEASE do not respond that we are misusing Git.

Oops. Never mind. Do go on.

The programmers do not use branches.

Why use Git at all then? The all success of this tool is based on pull/merge-request done between branches.
For source code (binaries are usually published /exported in a artifact referential like Nexus)
You seem to misuse G... (ah, right, scratch that)

the git archive wants to include everything in the repo at the time of the last commit into the archive

That is the very nature of a Git repository, a collection of snapshots (commits), each one representing the full content of the repository at that time.

include files from specific commits

A commit means: all the files, not "just the modified one"

I mention in "Single working branch with Git":

the content management nature of Git
but also, and that could be a workaround in your case, the ability to checkout multiple branch/commits in multiple folders (a bit like multiple snapshot views with their own config spec)

Using the git worktree command, you can easily checkout

checkout the commit C in a folder dedicated for zip/export (leaving your initial clone on master alone)
zip the files which have been added, copied, modified, renamed or that had their type changed (eg. file → symlink) in this commit. This leaves out deleted files.
See "Export only modified and added files with folder structure in Git"

That is:

git diff-tree -r --no-commit-id --name-only --diff-filter=ACMRT $commit_id | tar -czf file.tgz -T -

If you want to use git archive only:

git archive -o patch.zip a9359f9 $(git diff --name-only a9359f9^..a9359f9)

(replace a9359f9 by your own commit id)
In that later case, you might even skip the extra checkout I mentioned before.

The OP gitrdone confirms:

I was unaware of the git diff-tree --diff-filter=ACMRT option. With that and the general insight, it is now working as I intended.

Hiding the SCM system's functionality from the developers seems to be a fairly risky move. Wanting Git to "work like" any Non-GIT tool may have surprising consequences. — Brian Cowan, Sep 10 '19 at 20:17
@BrianCowan I fully agree. I actually tried that back in the day, but it did not end well. — VonC, Sep 10 '19 at 20:19

score 0 · Answer 2 · answered Jan 16 '23 at 13:20

Randomly came here. "migrating from ClearCase to Git" sounds like "git fiddled into CC or vice versa". What is the stage of his migration? (SO has become unwanting such discussions or answers, I still do it.) The question of "basline in git" is my interest and I currently lean towards the output (archive, zip, .exe or whatever). CC is corporate 1990. It seems to hang around (we have a project, too), maybe one of its creators "Leblang" (= "livelong" in German) makes a statement. However, in the case you describe, what is the actual need? Instead, it creates the hassle that is asking for solutions. Corporate-wise, it sounds like something from Management without clearly discussing the feasibility, needs and risks.

The answer is here: "It seems that we are performing a lot of steps to accomplish something simple."

How to create a baseline in Git that only include files from specific commits

2 Answers2