1

I am facing merging of a few repositories in to one, with miscellaneous file moved around

Based on some research on SO, SO, how to merge repositories I ended up with following sketch:

user=some_user
new_superproj=new_proj # new repository, will include old repositories 
hosting=bitbucket.org # gitgub etc
r1=repo1 # repo 1 to merge
r2=repo2
...
# clone to the new place. These are throw-away (!!!) directory
git clone git@${hosting}:${some_user}/${r1}.git
git clone git@${hosting}:${some_user}/${r2}.git
...
mkdir ${new_superproj} && cd ${new_superproj}

# dummy commit so we can merge
git init
dir > deleteme.txt
git add .
git commit -m "Initial dummy commit"
git rm ./deleteme.txt
git commit -m "Clean up initial file"

# repeat for all source repositories
repo=${r1}

pushd .
cd ../${repo}

# In the throw-away repository, move to the subfolder and rewrite log
git filter-branch --index-filter '
    git ls-files -s |
    sed "s,\t,&'"${repo}"'/," |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
' HEAD
popd

# now bring data in to the new repository
git remote add -f ${repo} ../${repo}
git merge --allow-unrelated-histories  ${repo}/master -m "Merging repo ${repo} in"
# remove remote to throw-away repo
git remote rm ${repo}

So far so good, unless we want to move files around while still preserving log. Git is sucks on move/rename and log rewrite fragment is not quite adapted, hence rewrite done uniform way, recursively for whole directory

Idea is, while files are moving we know there are no other changes in repository but renames and moves. So, how can I rewrite following part to be canonical, per file. Taken from git filter-branch, official documentation

git filter-branch --index-filter \
    'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
        GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
            git update-index --index-info &&
     mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

I have hard time to understand stuff past 'sed' and how it is applied for git filter-branch

I want to run script (bash, python etc), so:

for each file in repository get moved/renamed
    ...
    # in the loop, moved/renamed file found
    old_file="..." # e.g. a/b/c/old_name.txt
    new_file="..." # e.g. a/b/f/g/new_name.txt, at this point it is known, old_file and new_file is the same file
    update_log_paths(old_file, new_file) # <--- this part is needed

Any ideas?

Vetal
  • 275
  • 1
  • 3
  • 13

1 Answers1

1

As it turned out to be, hinting from the following command Move file-by-file in git, it is as simple as (pseudocode):

move_files
cd repo_root
git add . # so changes detected as moves, vs added/deleted
repo_moves=collect_moves_data()
git reset HEAD && git checkout . && git clean -df . # undo all moves

Biggest misunderstanding I found is "git log --follow" or other, "stronger" options doesn't work for many in related SO questions:

git log --follow <file>

does not show log until moved, while unchanged, file is committed.

for each_move in repo_moves
    old_file, new_file=deduct_old_new_name(each_move)

    new_dir=${new_file%/*}
    filter="$filter                            \n\
      if [ -e \"${old_file}\" ]; then               \n\
          echo                                      \n\
          if [ ! -e \"${new_dir}\" ]; then          \n\
            mkdir --parents \"${new_dir}\" && echo  \n\
          fi                                        \n\
          mv \"${old_file}\" \"${new_file}\"        \n\
        fi                                          \n\
      "

git filter-branch -f --index-filter "`echo -e $filter`"

If you need to get back:

git pull # with merge
git reset --hard <hash> # get hash of your origin/master, orignin/HEAD), which will be HEAD~2, but I'd check it manually and copy/paste hash
Vetal
  • 275
  • 1
  • 3
  • 13