1

Following is the script I'm using to rewrite the history of a submodule which is referenced in multiple parent repositories. This runs in Git bash in a windows environment.

#!/bin/bash

cd submodule_repo

git filter-branch --index-filter 'git rm -q --cached --ignore-unmatch files_to_remove' \--tag-name-filter cat -- --all

git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

git reflog expire --expire=now --all
git gc --prune=now

I need to somehow map the SHAs of the old commits to the newly created SHAs, so that I can update the same reference in the parent repositories. Is there a way to do it? I did look at Repository with submodules after rewriting history of submodule, but it's not really helping as I'm updating the original refs to make sure the files I'm removing are not repacked by any chance. I'm relatively new to using git so any guidance would be really appreciated.

Edit:

Following the steps mentioned in comments section of the accepted answer (by @torek) worked for me.

Ritik Kumar
  • 61
  • 1
  • 9

1 Answers1

0

There is no good way to do this (at least with git filter-branch and existing Git tools—the BFG leaves the desired map file around, but still requires constructing something).

When git filter-branch copies commits, it places each new commit's hash into a "map file" (pairing up the old and new IDs—actually a directory in the existing implementation, although this performs very poorly in large filters, so it may someday be altered), so that it's possible to convert from original-commit hash to rewritten-commit hash. This is how it provides the map function that the git filter-branch documentation refers to here:

A map function is available that takes an "original sha1 id" argument and outputs a "rewritten sha1 id" if the commit has been already rewritten, and "original sha1 id" otherwise; the map function can return several ids on separate lines if your commit filter emitted multiple commits.

Unfortunately, when git filter-branch finishes and cleans up, it removes the mapping table, rather than turning it into a useful database. If you had the database with the mappings, you could use that with any external items ("gitlink" entries in other repositories, test frameworks, or any other place you might have them saved). Without the map, there is no good way to handle this. The superproject says, e.g., "use commit 1234567" but that commit no longer exists in the rewritten submodule repository. The new commit's ID would be in the map, but there is no map.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thanks @torek That's kind of disappointing :( There should be an option to keep or discard the map once filter-branch finishes, would be really useful in such scenario. Any workaround that I can use to manually create the map? – Ritik Kumar Jun 09 '17 at 05:31
  • The easiest would be to hack on the filter-branch code. It's a shell script, so it's very easy to modify. Just find the point where it removes the temporary directory, and have it do something useful with the map first to save it. – torek Jun 09 '17 at 14:42
  • Been a while, but I'm back on this task now. so I used the `$GIT_COMMIT` variable as mentioned [here](https://stackoverflow.com/questions/14782906/how-do-i-get-a-list-of-old-new-rewritten-commit-shas-from-git-filter-branch.) and created the map as a part of filter-branch. It seems to be populating the map correctly (all my modified commits are registered in the map) along with the old commits. – Ritik Kumar Aug 07 '17 at 15:11
  • how can I use this mapping and rewrite the references to the submodule in the parent repository? Any good reference I can check out? Working on it on my own for now. – Ritik Kumar Aug 07 '17 at 15:16
  • Well, the mechanics are essentially another filter-branch: read every commit, check its submodule references, check whether they're mapped, if so apply mapping, and write new commit from updated index. Each of these parts is hard on its own but two are solved by the existing filter-branch code: all you need are the three in the middle ("check submodule references, remap them if mapped" if we boil them down to two steps). Just code those up and the problem is solved. :-) – torek Aug 07 '17 at 15:38
  • I l try to write it up. What should be my choice of filter? index-filter or commit-filter? – Ritik Kumar Aug 08 '17 at 14:03
  • index-filter -- all your changes can be done in-index, though you may need to use `git ls-index` and `git update-index` and even some `git cat-file`s (to read submodule information to find out whether a submodule reference is to *your particular* submodule(s)). The commit-filter is for changing the *number of commits*; you won't be doing that, just replacing some submodule hash IDs in existing commits. – torek Aug 08 '17 at 14:18
  • Do you mean `git ls-files` when you say `git ls-index`? I couldn't find any command by that name. I will work with this approach and see how it goes – Ritik Kumar Aug 09 '17 at 08:19
  • @RitikKumar: oops, yes, sorry, `git ls-files`. (I believe it *should* be called `git ls-index` since that's mostly what it does...) – torek Aug 09 '17 at 14:21
  • It's been a long time but thanks a tonne! It worked perfectly for my scenario :) Accepting this as the answer. I know git a lot better now. – Ritik Kumar Sep 04 '17 at 08:01