I am facing merging of a few repositories in to one, with miscellaneous file moved around
Based on some research on SO, SO, how to merge repositories I ended up with following sketch:
user=some_user
new_superproj=new_proj # new repository, will include old repositories
hosting=bitbucket.org # gitgub etc
r1=repo1 # repo 1 to merge
r2=repo2
...
# clone to the new place. These are throw-away (!!!) directory
git clone git@${hosting}:${some_user}/${r1}.git
git clone git@${hosting}:${some_user}/${r2}.git
...
mkdir ${new_superproj} && cd ${new_superproj}
# dummy commit so we can merge
git init
dir > deleteme.txt
git add .
git commit -m "Initial dummy commit"
git rm ./deleteme.txt
git commit -m "Clean up initial file"
# repeat for all source repositories
repo=${r1}
pushd .
cd ../${repo}
# In the throw-away repository, move to the subfolder and rewrite log
git filter-branch --index-filter '
git ls-files -s |
sed "s,\t,&'"${repo}"'/," |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
' HEAD
popd
# now bring data in to the new repository
git remote add -f ${repo} ../${repo}
git merge --allow-unrelated-histories ${repo}/master -m "Merging repo ${repo} in"
# remove remote to throw-away repo
git remote rm ${repo}
So far so good, unless we want to move files around while still preserving log. Git is sucks on move/rename and log rewrite fragment is not quite adapted, hence rewrite done uniform way, recursively for whole directory
Idea is, while files are moving we know there are no other changes in repository but renames and moves. So, how can I rewrite following part to be canonical, per file. Taken from git filter-branch, official documentation
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
I have hard time to understand stuff past 'sed' and how it is applied for git filter-branch
I want to run script (bash, python etc), so:
for each file in repository get moved/renamed
...
# in the loop, moved/renamed file found
old_file="..." # e.g. a/b/c/old_name.txt
new_file="..." # e.g. a/b/f/g/new_name.txt, at this point it is known, old_file and new_file is the same file
update_log_paths(old_file, new_file) # <--- this part is needed
Any ideas?