1

Please consider the Bash script testgit.sh, pasted at the end of this post, which will reconstruct the repository examples here.

So, I have oldrepo_git repository, which has some files and folders - and then, a newrepo_git repository, which has just a single commit (for a README). This is what gitk --all sees in these repos:

oldrepo_newrepo

Basically, I want to export the entire git history for the file a.txt and the entirety of the aa subfolder (so, the aa/aa.txt and aa/ab.txt files - but not the README or b.txt files) from the oldrepo_git repository, and import it in the newrepo_git repository - if possible, with the right timestamps, and branching/merging info.

Since the file called README in oldrepo_git is not part of this operation, and since newrepo_git has nothing else but a README file, I wouldn't expect any conflicts to occur. However, I'm not sure what commands I can use to do this: I'm aware there is git filter-branch, but as far as I know, it will change the history of oldrepo_git "in-place" - it will not "import" this history into newrepo_git.

In other words, if the history of oldrepo_git is:

$ git log --oneline --graph
*   7e26890 (HEAD -> master) Merge branch 'testbranch'
|\
| * 56ef109 (testbranch) change 5 made
| * 1a78db3 change 4 made
| * d98b4cf change 3 made
| * e5e49af change 2 made
| * 8704c24 change 1 made
|/
* f318d97 added a.txt
* 252bf7f Initial commit

... after the process is done, I would like to see this as the history of newrepo_git:

$ git log --oneline --graph
*   XXXYYGG (HEAD -> master) Merge branch 'testbranch'
|\
| * XXXYYFF (testbranch) change 5 made
| * XXXYYEE change 4 made
| * XXXYYDD change 3 made
| * XXXYYCC change 2 made
| * XXXYYBB change 1 made
|/
* XXXYYAA added a.txt
* 8e99c2d Initial commit by Bob

How could I perform this operation?


The Bash script testgit.sh:

#!/usr/bin/env bash

rm -rf oldrepo_git newrepo_git
mkdir oldrepo_git newrepo_git

cd oldrepo_git
git init
git config user.name tester
git config user.email tester@example.com
echo "# README" >> README
git add README
GIT_COMMITTER_DATE="1558960260" git commit --date "1558960260" -m "Initial commit"
echo "Testing" >> a.txt
git add a.txt
GIT_COMMITTER_DATE="1558960270" git commit --date "1558960270" -m "added a.txt"
git checkout -b testbranch
mkdir aa bb
for ix in 1 2 3 4 5; do
  echo $ix >> a.txt
  echo $ix >> b.txt
  echo $ix >> aa/aa.txt
  echo $ix >> aa/ab.txt
  git add .
  newts="$((1558960270+ix*10))"
  GIT_COMMITTER_DATE="$newts" git commit --date "$newts" -m "change $ix made"
done
git checkout master
ix="$((ix+1))"; newts="$((1558960270+ix*10))"
GIT_COMMITTER_DATE="$newts" GIT_AUTHOR_DATE="$newts" git merge --no-ff --no-edit testbranch

cd ../newrepo_git
git init
git config user.name bob
git config user.email bob@example.com
echo "# Bob's README" >> README
git add README
GIT_COMMITTER_DATE="1558960260" git commit --date "1558960260" -m "Initial commit by Bob"
sdbbs
  • 4,270
  • 5
  • 32
  • 87
  • 1
    Are there files in `oldrepo_git` you don't want to transfer over? If not, I would be inclined to merge in the actual history from `oldrepo_git` into `newrepo_git`, which you can do by adding both remote in one sandbox and doing a regular merge. – joanis May 27 '19 at 14:31
  • Thanks @joanis - indeed, there are files I do not want to transfer "*... not the README or b.txt files*" (and I intended to not want anything in the folder `bb/` either, but I pasted the wrong example, so there is nothing in `bb/` if you run the OP example as it is, anyway) - which is why I thought a regular "add remote, then fetch" approach would not work. – sdbbs May 27 '19 at 14:35
  • 1
    OK, I guess I should have read the question more carefully. Then you want to look into tools that rewrite history with some files removed - search for how to get rid of large files in a repo, there are plenty of questions about that in SO - and then merging the result in might be what you want. – joanis May 27 '19 at 14:39

1 Answers1

2

EDIT: you may want to add an extra echo $ix >> bb/bb.txt in the for loop in the testgit.sh script in OP, so that the output in this post matches.


Ok, I guess this is how it should be done - at least in relation to the OP (where we don't have remote repos yet); first, copy the oldrepo:

cp -a oldrepo_git oldrepo_filt_git

Then apparently we have to delete all the stuff we don't want in the copied oldrepo, by using git filter-branch in combination with git rm - part of this command I found here: Detach many subdirectories into a new, separate Git repository

cd oldrepo_filt_git
git filter-branch --index-filter "git rm --cached --ignore-unmatch -r $(bash -O extglob -c 'ls -xd !(a*)')" --prune-empty -- --all

Note, since here we want to tell to git rm what to delete, we want to specify what we do not want to keep, as the inverse of what we want to keep; and here I want to keep a.txt file and aa folder - so a glob match would be a*. Then you need the bash extglob function to get that; so if a whole list is:

$ ls
a.txt  aa  b.txt  bb  README

... then the extglob stanza which will give us only the file/folder names to delete gives:

$ bash -O extglob -c 'ls -xd !(a*)'
b.txt  bb  README

So, after git filter-branch command is ran:

$ git filter-branch --index-filter "git rm --cached --ignore-unmatch -r $(bash -O extglob -c 'ls -xd !(a*)')" --prune-empty -- --all
Rewrite 252bf7ff5f385dad880240d5d80e68f24ae09b59 (1/8) (0 seconds passed, remaining 0 predicted)    rm 'README'
Rewrite f318d9712cd7aacdb5dd45febbcdbbce6b741e08 (2/8) (1 seconds passed, remaining 3 predicted)    rm 'README'
Rewrite 00b62e7da8784d45850d7483cbea88fdc4aa844c (2/8) (1 seconds passed, remaining 3 predicted)    rm 'README'
rm 'b.txt'
rm 'bb/bb.txt'
Rewrite c618eff47d38412c54a8381a5bacc921bddefe2d (2/8) (1 seconds passed, remaining 3 predicted)    rm 'README'
rm 'b.txt'
rm 'bb/bb.txt'
Rewrite 2cada8d822d83f37bdc4a37bcfb03047c1cc1ded (5/8) (3 seconds passed, remaining 1 predicted)    rm 'README'
rm 'b.txt'
rm 'bb/bb.txt'
Rewrite 7b296b70018f4105f190d06ed4d9c58e3f80532f (5/8) (3 seconds passed, remaining 1 predicted)    rm 'README'
rm 'b.txt'
rm 'bb/bb.txt'
Rewrite 18a1ad1d35cd8573c39485d0a29b630325f9727d (7/8) (5 seconds passed, remaining 0 predicted)    rm 'README'
rm 'b.txt'
rm 'bb/bb.txt'
Rewrite 2ffbbf03d51363f1ced3aaaf000d5921c9d8b919 (7/8) (5 seconds passed, remaining 0 predicted)    rm 'README'
rm 'b.txt'
rm 'bb/bb.txt'

Ref 'refs/heads/master' was rewritten
Ref 'refs/heads/testbranch' was rewritten

... we have:

$ git log --oneline --graph --stat
*   31cd8b5 (HEAD -> master) Merge branch 'testbranch'
|\
| * 42b153d (testbranch) change 5 made
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| * ff1be9d change 4 made
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| * 90f050c change 3 made
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| * d2d2136 change 2 made
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| * ab237ac change 1 made
|/
|    a.txt     | 1 +
|    aa/aa.txt | 1 +
|    aa/ab.txt | 1 +
|    3 files changed, 3 insertions(+)
* ea0a32d added a.txt
   a.txt | 1 +
   1 file changed, 1 insertion(+)

... which confirms that this is the filtered state of the repository that I wanted - and I guess, this I'd want to merge into my newrepo_git now.


Ok, so it turns out, I don't quite want to "merge" into newrepo_git, I want to "join" - most of the info for here I found in Join two Git repositories and keep the original commit dates. - axiac@web

So, first, we change directory to newrepo:

cd ../newrepo_git

Note that at this point, most of the resources online will recommend:

git remote add oldrepo ../oldrepo_filt_git/
git pull oldrepo master --allow-unrelated-histories

... but this will result with a history with two roots - which is not what I want:

$ git log --oneline --graph --stat
*   845c81e (HEAD -> master) Merge branch 'master' of ../oldrepo_filt_git
|\
| *   31cd8b5 (oldrepo/master) Merge branch 'testbranch'
| |\
| | * 42b153d (oldrepo/testbranch) change 5 made
| | |  a.txt     | 1 +
| | |  aa/aa.txt | 1 +
| | |  aa/ab.txt | 1 +
| | |  3 files changed, 3 insertions(+)
| | * ff1be9d change 4 made
| | |  a.txt     | 1 +
| | |  aa/aa.txt | 1 +
| | |  aa/ab.txt | 1 +
| | |  3 files changed, 3 insertions(+)
| | * 90f050c change 3 made
| | |  a.txt     | 1 +
| | |  aa/aa.txt | 1 +
| | |  aa/ab.txt | 1 +
| | |  3 files changed, 3 insertions(+)
| | * d2d2136 change 2 made
| | |  a.txt     | 1 +
| | |  aa/aa.txt | 1 +
| | |  aa/ab.txt | 1 +
| | |  3 files changed, 3 insertions(+)
| | * ab237ac change 1 made
| |/
| |    a.txt     | 1 +
| |    aa/aa.txt | 1 +
| |    aa/ab.txt | 1 +
| |    3 files changed, 3 insertions(+)
| * ea0a32d added a.txt
|    a.txt | 1 +
|    1 file changed, 1 insertion(+)
* 8e99c2d Initial commit by Bob
   README | 1 +
   1 file changed, 1 insertion(+)

What I want, instead, is that commit ea0a32d added a.txt follows/stems from 8e99c2d Initial commit by Bob -> that would be the "joining" of repositories mentioned earlier.

Note also, you could do git format-patch --root HEAD -o ../ from oldrepo_git, and then import the patches in newrepo_git with for ix in ../*.patch; do echo $ix; git am -k < $ix; done - but this will not preserve merge history (all history will be flattened)!

So, in order to do a proper "joining", instead, I do first a fetch:

$ git remote add old-repo ../oldrepo_filt_git

$ git fetch old-repo
warning: no common commits
remote: Enumerating objects: 29, done.
remote: Counting objects: 100% (29/29), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 29 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (29/29), done.
From ../oldrepo_filt_git
 * [new branch]      master     -> old-repo/master
 * [new branch]      testbranch -> old-repo/testbranch

... then adding and renaming of branches, (and saving of timestamps in /tmp/hashlist) as recommended in the post - and then cherry-pick the first commit in old-repo:

$ git branch oldrepo-head old-repo/master
Branch 'oldrepo-head' set up to track remote branch 'master' from 'old-repo'.

$ git branch oldrepo-root $(git log oldrepo-head --reverse --pretty=%H | head -n 1)

$ git log --pretty='%T %ct' ..oldrepo-head > /tmp/hashlist

$ git branch -m master new-master

$ git cherry-pick --strategy-option=theirs oldrepo-root
[new-master 427cf77] added a.txt
 Author: tester <tester@example.com>
 Date: Mon May 27 14:31:10 2019 +0200
 1 file changed, 1 insertion(+)
 create mode 100644 a.txt

At this point, the repo state is:

$ git log --oneline --graph
* 427cf77 (HEAD -> new-master) added a.txt
* 8e99c2d Initial commit by Bob

Now, here we can do a rebase - note that in the cited post, they get an error here, but for this particular example, it seems to proceed without an error:

$ git rebase --preserve-merges --onto new-master --root oldrepo-head
Successfully rebased and updated refs/heads/oldrepo-head.

At this point, the newrepo history is almost there - the only problem is that the commit timestamps are different:

$ git log --graph --pretty=fuller
*   commit 61fbe54721a9432e91e48917ed036f55da4105a4 (HEAD -> oldrepo-head)
|\  Merge: 427cf77 f8e8f8a
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:32:10 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Tue May 28 12:57:00 2019 +0200
| |
| |     Merge branch 'testbranch'
| |
| * commit f8e8f8aedaa7bc999bdfdd49542c9ee04edb770c
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:32:00 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Tue May 28 12:56:58 2019 +0200
| |
| |     change 5 made
| |
| * commit b084029040d6596e0795e7567b2684dc59c02241
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:31:50 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Tue May 28 12:56:56 2019 +0200
| |
| |     change 4 made
| |
| * commit b62dabca3a46efbe76edb10591935db136f74aaa
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:31:40 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Tue May 28 12:56:54 2019 +0200
| |
| |     change 3 made
| |
| * commit 252f3e9697b87b4f59cd0a74681ef25401340fcf
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:31:30 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Tue May 28 12:56:51 2019 +0200
| |
| |     change 2 made
| |
| * commit c382c8a713489ca0e5dc106bed29fdce379952b0
|/  Author:     tester <tester@example.com>
|   AuthorDate: Mon May 27 14:31:20 2019 +0200
|   Commit:     bob <bob@example.com>
|   CommitDate: Tue May 28 12:56:49 2019 +0200
|
|       change 1 made
|
* commit 427cf77417a2406db5dd6a0e9bd4fb60542f2ee1 (new-master)
| Author:     tester <tester@example.com>
| AuthorDate: Mon May 27 14:31:10 2019 +0200
| Commit:     bob <bob@example.com>
| CommitDate: Tue May 28 12:55:43 2019 +0200
|
|     added a.txt
|
* commit 8e99c2d71048b4999d012b33d34386351d6d0fef
  Author:     bob <bob@example.com>
  AuthorDate: Mon May 27 14:31:00 2019 +0200
  Commit:     bob <bob@example.com>
  CommitDate: Mon May 27 14:31:00 2019 +0200

      Initial commit by Bob

They have the same problem in the cited post, too, and the suggestion is to use filter-branch to rewrite the commit timestamps, to be the same as the author timestamps:

$ git filter-branch --env-filter 'export GIT_COMMITTER_DATE=$(fgrep -m 1 $(git log -1 --pretty=%T $GIT_COMMIT) /tmp/hashlist | cut -d" " -f2)' new-master..oldrepo-head
Rewrite 61fbe54721a9432e91e48917ed036f55da4105a4 (3/6) (1 seconds passed, remaining 1 predicted)
Ref 'refs/heads/oldrepo-head' was rewritten

... however, this did not work for me, because by now, the commit hashes have changed from what was there in /tmp/hashlist.

So, I used an easier approach - simply have filter-branch read the author date timestamp at each commit, and copy/reapply it as committer date (note I use -f here to compensate for the effects of the previous filter-branch, otherwise I get "Cannot create a new backup. ... Force overwriting the backup with -f"):

$ git filter-branch -f --env-filter 'export GIT_COMMITTER_DATE=$(git log -1 --pretty=%at $GIT_COMMIT)' new-master..oldrepo-head
Rewrite f2b2385d85c74dbf0cbf8fabc02ec30cb50d8f2a (3/6) (1 seconds passed, remaining 1 predicted)
Ref 'refs/heads/oldrepo-head' was rewritten

At this point, we can see that the state of the repo is almost as I need it - except the first oldrepo commit does not have the commit timestamp changed; so I try again:

sd@DESKTOP-RO11QOC MSYS /c/Users/sd/AppData/Local/Temp/newrepo_git
$ git filter-branch -f --env-filter 'export GIT_COMMITTER_DATE=$(git log -1 --pretty=%at $GIT_COMMIT)' 427cf77417a
You must specify a ref to rewrite.

sd@DESKTOP-RO11QOC MSYS /c/Users/sd/AppData/Local/Temp/newrepo_git
$ git filter-branch -f --env-filter 'export GIT_COMMITTER_DATE=$(git log -1 --pretty=%at $GIT_COMMIT)' new-master
Rewrite 427cf77417a2406db5dd6a0e9bd4fb60542f2ee1 (2/2) (0 seconds passed, remaining 0 predicted)
Ref 'refs/heads/new-master' was rewritten

... but it still shows the same difference between timestamps in log:

$ git log --graph --stat --pretty=fuller
*   commit cdaa4b82f3833770a9051a2490487548603e3af8 (HEAD -> oldrepo-head)
|\  Merge: 427cf77 9bfc6cd
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:32:10 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Mon May 27 14:32:10 2019 +0200
| |
| |     Merge branch 'testbranch'
| |
...
* commit 427cf77417a2406db5dd6a0e9bd4fb60542f2ee1 (refs/original/refs/heads/new-master)
| Author:     tester <tester@example.com>
| AuthorDate: Mon May 27 14:31:10 2019 +0200
| Commit:     bob <bob@example.com>
| CommitDate: Tue May 28 12:55:43 2019 +0200
|
|     added a.txt
|
|  a.txt | 1 +
|  1 file changed, 1 insertion(+)
...

Anyways, now we should "cleanup" as recommended in the post:

$ git branch -m oldrepo-head master
$ git branch -D oldrepo-root
Deleted branch oldrepo-root (was ea0a32d).
$ git branch -D new-master
Deleted branch new-master (was 4ac225e).
$ rm .git/refs/original/refs/heads/new-master
$ git remote remove old-repo

And finally, I managed to overwrite the commit timestamp for commit 427cf774, by adding a temporary branch there (since filter-branch needs a ref, it seemingly cannot use a commit hash directly), and using it to specify tmp^..tmp as the filter-branch range:

$ git branch tmp 427cf774
$ git filter-branch -f --env-filter 'export GIT_COMMITTER_DATE=$(git log -1 --pretty=%at $GIT_COMMIT)' tmp^..tmp
Rewrite 427cf77417a2406db5dd6a0e9bd4fb60542f2ee1 (1/1) (0 seconds passed, remaining 0 predicted)
Ref 'refs/heads/tmp' was rewritten
$ git log --graph --stat --pretty=fuller tmp
* commit 4ac225e308e280e3a96be0168c6e9dece44d4979 (tmp)
| Author:     tester <tester@example.com>
| AuthorDate: Mon May 27 14:31:10 2019 +0200
| Commit:     bob <bob@example.com>
| CommitDate: Mon May 27 14:31:10 2019 +0200
|
|     added a.txt
|
|  a.txt | 1 +
|  1 file changed, 1 insertion(+)
|
...
$ git branch -D tmp
Deleted branch tmp (was 4ac225e).

... and finally, I can see that newrepo contains oldrepo commits as I envisioned them:

$ git log --graph --stat --pretty=fuller
*   commit cdaa4b82f3833770a9051a2490487548603e3af8
|\  Merge: 427cf77 9bfc6cd
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:32:10 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Mon May 27 14:32:10 2019 +0200
| | 
| |     Merge branch 'testbranch'
| | 
| * commit 9bfc6cde58be9102102f839e5cc0fe8f25f0f78c
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:32:00 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Mon May 27 14:32:00 2019 +0200
| | 
| |     change 5 made
| | 
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| | 
| * commit 485ae0f50054610b6a41098fb695e59d194cc856
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:31:50 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Mon May 27 14:31:50 2019 +0200
| | 
| |     change 4 made
| | 
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| | 
| * commit b6804b6e8e313b5c4766568a287f0785503e3a11
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:31:40 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Mon May 27 14:31:40 2019 +0200
| | 
| |     change 3 made
| | 
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| | 
| * commit 8b463423d2a99929a6a248e38ba1368a56d3769d
| | Author:     tester <tester@example.com>
| | AuthorDate: Mon May 27 14:31:30 2019 +0200
| | Commit:     bob <bob@example.com>
| | CommitDate: Mon May 27 14:31:30 2019 +0200
| | 
| |     change 2 made
| | 
| |  a.txt     | 1 +
| |  aa/aa.txt | 1 +
| |  aa/ab.txt | 1 +
| |  3 files changed, 3 insertions(+)
| | 
| * commit 3bc0bed30ebea1498a15711825b2ea8347cc374d
|/  Author:     tester <tester@example.com>
|   AuthorDate: Mon May 27 14:31:20 2019 +0200
|   Commit:     bob <bob@example.com>
|   CommitDate: Mon May 27 14:31:20 2019 +0200
|   
|       change 1 made
|   
|    a.txt     | 1 +
|    aa/aa.txt | 1 +
|    aa/ab.txt | 1 +
|    3 files changed, 3 insertions(+)
| 
* commit 427cf77417a2406db5dd6a0e9bd4fb60542f2ee1
| Author:     tester <tester@example.com>
| AuthorDate: Mon May 27 14:31:10 2019 +0200
| Commit:     bob <bob@example.com>
| CommitDate: Tue May 28 12:55:43 2019 +0200
| 
|     added a.txt
| 
|  a.txt | 1 +
|  1 file changed, 1 insertion(+)
| 
* commit 8e99c2d71048b4999d012b33d34386351d6d0fef
  Author:     bob <bob@example.com>
  AuthorDate: Mon May 27 14:31:00 2019 +0200
  Commit:     bob <bob@example.com>
  CommitDate: Mon May 27 14:31:00 2019 +0200

      Initial commit by Bob

   README | 1 +
   1 file changed, 1 insertion(+)

Easy, eh? :)


But I'm not really sure if this is the right process - so if anyone more knowledgeable can confirm it - or if there is an easier way, - that would be great ...

sdbbs
  • 4,270
  • 5
  • 32
  • 87