(I put part of this in as a comment, but decided to expand on it.)
Note that in Git, files do not have history. Commits are history, and commits have files.
For instance, perhaps commit a666666
has files A
, B
, and C
and message "modify A", and its history is "parent is a555555
". Now we step back to commit a555555
. It has files A
, B
, and C
and message "add file C" and parent a444444
. We then step back to commit a444444
, which has files A
and B
, parent a333333
, and some message; and so on.
It sounds like what you want to do is extract every commit that touches certain paths ("files in project B"), copy those specific file-paths over to some other repository, and then—in that other repository—make a new commit, perhaps re-using the commit message from the current commit.
There are a number of ways to do this. None is necessarily best, although some may make things easier, depending on just what results you want. None are completely built-in to Git either, so you must do at least a little bit of programming. I'll show one method, which is the most obvious to me.
You should also decide what, if anything, you want to do with merge commits. Merges cause non-linear history, and copying non-linear history requires either extreme cleverness, or some sort of simplification. I will leave details to you.
As for how to copy the commits, here are two relatively simplistic methods that assume we only care about some sort linear or linearize-able history.
Basic setup and finding interesting commits
Set up repo src, the source repository that contains commits you want to make copies from (i.e., a repo copy of project B in your question). Set up second repo dst, the destination (target) into which you will copy files from commit in src.
Now we need to find "interesting commits". These are commits that modify "interesting files", i.e., files that you wish to copy into dst.
The Git command to list commit IDs is git rev-list
, which takes about four billion options to specify which Git objects are interesting and how to show them. In our case we want commits, listed in reverse topological order (oldest first), starting from the latest point you want to copy from—probably a branch name—and probably stopping at some specified point, but only those commits that modified specific path(s). Hence:
git rev-list --topo-order --reverse latest ^stopat -- path1 path2 ... pathN
(Note that you may list directory paths.) Here latest
is the branch name, or a commit ID; ^stopat
is the literal character ^
followed by a name or another commit ID; and the path
s are the names of all the files, or directories full of files, that are "interesting".
You may want to include --first-parent
and/or --no-merges
as well, depending on how you want your history linearized. Study the git rev-list
documentation to decide for yourself.
(Along the way, you may note that stopat..latest
is a shorthand syntax for the same thing we did in longer form above.)
You may want to redirect the output of the above command to a file, since it might be quite long. You can then inspect the selected revisions, or some random subset of them, to see if you like the ones selected. (Use git show
to view specific commits; note that by default, git show
uses combined diffs to show merge commits.)
Now that you have a list of interesting commits, we can go on to the methods for copying (part of) those commits from src to dst. Note: none of the code below is tested.
Method: literal copy
This is probably the simplest method. Write a script that iterates through each commit ID and checks out that commit in src, then copies the files from src to dst, git add
s them in dst, and runs git commit
. It's OK to re-add
files that are not changed, so this script is pretty simple. This assumes that ./list-of-files
contains "interesting" paths and the repos are in ./src
and ./dst
, with the interesting commit revision IDs in ./interesting
. (You must write die
; it's a simple shell function.) It probably has some issues if not every path exists in every commit (not a problem if the paths are just one subdirectory). It definitely has issues if any of the path
arguments have whitespace in them, since $(cat ../list-of-paths)
will just split at any white space.
while read rev; do
(cd src &&
git checkout -q $rev &&
git log -1 --pretty=format:%B > ../commit-msg) ||
die "failed to check out $rev in src"
(cd src && cp -R $(cat ../list-of-paths) ../dst/) ||
die "failed to copy files from src to dst"
(cd dst &&
git add -- $(cat ../list-of-paths) &&
git commit -q -F ../commit-msg) ||
die "failed to commit $rev in dst"
done < interesting