1. Set up the target repository by cloning the source.
$ git clone <sourceRepo>
2. Check out the relevant branch. Replace branchname
by the actual branch name (also in all the following steps).
$ git checkout branchname
3. Do an initial rewrite using filter-branch
and a --tree-filter
, updating tags in the process with --tag-name-filter
. This is just an example filter that replaces the first occurrence of "text" with "modified" in all files matching the "*.txt" glob.
$ git filter-branch --tree-filter 'sed -i "s/text/modified/" *.txt' --tag-name-filter cat -- branchname
4. Create a tag to keep a record of the last source and target rev.
$ git tag lastsourcerev origin/branchname
$ git tag lasttargetrev branchname
Now whenever the time comes to update to new revisions from the source repo the following steps can be used. They only apply the tree-filter to the new commits and graft the new (rewritten) commits to the existing (previously rewritten) ones.
1. Fetch new commits/tags from the source repo:
$ git fetch origin
2. Reset to the new tip of the source branch.
$ git reset --hard origin/branchname
3. Apply filter-branch
with an extra --parent-filter
to graft the new commits to the existing ones. Note that we need the -f
(force) option as the previous filter-branch
command left refs/original
. The --parent-filter
makes use of the tags that stored the last source and target revs. The whole filter-branch
is limited to the commits between the last processed source rev and the newest source commit (that we reset branchname
to).
$ git filter-branch -f --tree-filter 'sed -i "s/text/modified/" *.txt' --tag-name-filter cat --parent-filter "sed s/$(git rev-parse lastsourcerev)/$(git rev-parse lasttargetrev)/g" -- lastsourcerev..branchname
4. Update the tracking tags to the new situation:
$ git tag -f lastsourcerev origin/branchname
$ git tag -f lasttargetrev branchname
Repeat these steps as needed. Once no more updates are to be done, the lastsourcerev
and lasttargetrev
helper tags can be deleted.
Note that the update process could be arbitrarily split into smaller increments by resetting the branch to some in-between commit from source and recording that commit as lastsourcerev
. Likewise the initial rewrite could be split up by creating a branch pointing at an in-between commit from source and recording that as lastsourcerev
and then applying the update steps to go further.
Note also that this process relies solely on filter-branch
to avoid any problems regarding tag rewrites or merge commits that rebasing newly incoming commits would otherwise inevitably cause.
Packaged as a shell script the incremental update part could look like this:
#!/bin/sh
REMOTE=origin
LOCAL_BRANCH=master
REMOTE_BRANCH=origin/master
SOURCE_REV_TAG=lastsourcerev
TARGET_REV_TAG=lasttargetrev
TREE_FILTER='sed -i "s/text/modified/" *.txt'
set -e
git fetch "$REMOTE"
if [ $(git rev-parse "$SOURCE_REV_TAG") = $(git rev-parse "$REMOTE_BRANCH") ]
then
echo "no new commits, nothing to do"
exit 0
fi
git checkout "$LOCAL_BRANCH"
git reset --hard "$REMOTE_BRANCH"
git filter-branch -f --tree-filter "$TREE_FILTER" \
--tag-name-filter cat \
--parent-filter "sed s/$(git rev-parse "$SOURCE_REV_TAG")/$(git rev-parse "$TARGET_REV_TAG")/g" \
-- "$SOURCE_REV_TAG"..
git tag -f "$SOURCE_REV_TAG" "$REMOTE_BRANCH"
git tag -f "$TARGET_REV_TAG"
The only edge case that comes up is when no new commits are available. In such a case the git reset --hard
would update the local branch to the remote branch, but then no filter step would be applied because no revs are to be rewritten. The script above handles that by checking if the source rev tracking tag points at the same commit as the remote branch.