3

Say I've got project A with this history :

  • HEAD : Hotfix
  • HEAD~1 : Changelog
  • HEAD~1 : Dev
  • HEAD~2 : Feature
  • HEAD~3 : Typo

And I started project B on top of that history, getting to this :

  • HEAD : Changelog
  • HEAD~1 : Clean repo
  • HEAD~2 : Delete unnecessary files
  • HEAD~3 : Hotfix (project A)
  • HEAD~4 : Changelog (project A)
  • HEAD~5 : Dev (project A)
  • HEAD~6 : Feature (project A)
  • HEAD~7 : Typo (project A)

Now I've got a clean project B and I'd like to start my next apps with a fork of this one.

Problem is the bold commit : it deletes files. I would like to rewrite the history to omit that action, preventing merges from the base repo to project A to delete my files altogether.

Ideally, the deleted files should never appear in project B's history.

This might be a duplicate question of "Delete commits from a branch in Git", "Completely remove file from all Git repository commit history" or "GIT How to clean history pack for old deleted files?" but it might not as the problem exposed is pretty specific.

Please share you experience with the community!

Community
  • 1
  • 1
Armel Larcier
  • 15,747
  • 7
  • 68
  • 89

1 Answers1

2

So what you want to end up with is sort of like the following:

HEAD : Changelog
HEAD~1 : Clean repo
HEAD~3 : Hotfix (project A)    + part of "HEAD~2 : Delete unnecessary files" if applicable
HEAD~4 : Changelog (project A) + part of "HEAD~2 : Delete unnecessary files" if applicable
HEAD~5 : Dev (project A)       + part of "HEAD~2 : Delete unnecessary files" if applicable
HEAD~6 : Feature (project A)   + part of "HEAD~2 : Delete unnecessary files" if applicable
HEAD~7 : Typo (project A)      + part of "HEAD~2 : Delete unnecessary files" if applicable

right? Well that is relatively straight forward. As alluded to by the writeup above the key to achieving this is to split up the "problematic" commit into smaller commits that modify earlier commits where the "problem" first where introduced, and then re-arranging the history by merging1 those fixup commits with those that introduced the "problem" in the first place.

In this particular case the "problem" is just existence of some files, but in the general case this might be any problem. E.g. say you have started using some code analysis tool that have detected several memory leaks which you have then fixed and made a commit out of. But ideally you would like to split up and distribute those fixes back in time into the commits that introduced the memory leaks in the first place.

For the following, I assume that "HEAD~2 : Delete unnecessary files" is a pure delete files commit2. I also assume that "HEAD~3" is the head of a branch project_a and that "HEAD" is head of a branch project_b. I assume commit "HEAD~7" has a tag root. I assume no files are currently modified. The example commands below are just from memory and not actually tested, so if there are any errors let me know in a comment.


The tool to use for this is interactive rebase in two steps. The first step is to split up "HEAD~2 : Delete unnecessary files" into whatever multiple commits that are applicable and the last step is to distribute those commits back in history as appropriate.

Step 1

git checkout -b project_b.rebase project_b
git rebase --interactive project_a
# For the "HEAD~2" commit change to "edit" and exit editor

Now we are ready to split up this commit. We start by discarding the commit itself but not its changes with reset:

git reset HEAD^

This command does not change files on disk, but moves the current HEAD (3) to the argument given, e.g. the parent of the current head - in effect discarding the latest commit.

So now we have some files deleted. Let's iterate over those and find out which commits they are related to4:

for file in $(git ls-files -d); do echo "$file"; git log --oneline -- "$file" | cat; echo; done

For this example I assume the following with three files are deleted:

some-file1.txt
HEAD~5 : Dev (project A)

some-file2.txt
HEAD~3 : Hotfix (project A)
HEAD~6 : Feature (project A)

some-file3.txt
HEAD~3 : Hotfix (project A)

So let's make commits out of this. some-file1.txt is trivial so let's start with that.

git rm some-file1.txt   # Add the file as deleted to the index/cache/staging.
git commit -m "f  Dev"

The commit message of these commits are the prefix "f " plus the commit message which it targets, you'll see why shortly.

Similarly for some-file3.txt

git rm some-file3.txt
git commit -m "f  Hotfix"

Now for some-file2.txt it is first created in commit "Feature" and then modified in "Hotfix". Since we do not want this file present in the history we need to target the commit where the file was created, and just note that modifying that commit will create a merge conflict later on. That's ok, we just need to resolve it when it arrives.

git rm some-file3.txt
git commit -m "f  Feature"

At this point all deleted files are checked in as separate commits, and we are done splitting. Time to finish the interactive rebase:

git rebase --continue

Step 2

So distribution and merge1 time. Again an interactive rebase5:

git rebase --interactive root

The editor should now show (obviously with different commit ids)

pick 100001 Typo
pick 100002 Feature
pick 100003 Dev
pick 100004 Changelog
pick 100005 Hotfix
pick 200001 f  Dev
pick 200002 f  Hotfix
pick 200003 f  Feature

Re-arrange the fixup commits to match the commit it targets and replace "pick" with "f" (short for fixup) like this (and then notice how nice the commit messages line up using the "f " prefix):

pick 100001 Typo
pick 100002 Feature
f 200003 f  Feature
pick 100003 Dev
f 200001 f  Dev
pick 100004 Changelog
pick 100005 Hotfix
f 200002 f  Hotfix

Now exit the editor and let the rebase proceed. As I mentioned above, there will be a conflict for some-file2.txt which now is deleted in the "Feature" commit but then modified later in the "Hotfix" commit. When git comes as far as to processing the "Hotfix" commit it will give up automatic processing and drop you off in a shell telling you to "clean up the mess".

In this particular case it is rather simple, we just want the file gone:

git rm some-file2.txt
git rebase --continue

in the general case you would need to resolve conflicts normally before moving on.

With that, your history should look like the list I wrote at the very beginning of this answer. There should be no content difference between the current, rebased branch and the branch we started from:

git diff project_b.rebase project_b  # Should display nothing

In this example you now have a new branch with the wanted result, in addition to the old, initial result, so depending on if you really want to purge the old stuff away, then you could look into those answers you linked.


This might seem intimidating, but at its core it is relatively simple: split up the parts of a commit that should be moved back in the history, followed by a new rebase where those fixup commits are processed. There might just be several individual steps involved.

I might not do something like this every single day, but it is also not uncommon for me to do this a double digit number of times per day. Try it out a few times and make this a common tool you use. As I said in a presentation I held at work recently, "If you're not regularly rewriting git history you're doing git wrong".


1 I mean merge like the general word here, not "git merge".

2 If it also modifies the content of some files those modifications should be extracted into its own, separate commit, so that you end up with a "clean" remove files commit. Strictly speaking such a "clean" commit is not absolutely required, it just makes the example in this answer simpler.

Generic example: say commit4 of your hello world project added functionality to print a message but hello.c fails to include stdio.h, that header file is included in commit8 which adds support for getting the message from argv and also changes other files (like the changelog), and the main branch is now at commit12. Step one is git rebase --interactive commit8^. This time we do not want to discard the whole commit, only pick out from one file, hello.c. The most flexible, do-not-generate-conflict way of doing it is

git reset HEAD^ -- hello.c
git add -p     # Add only #include <stdio.h>
git commit -m "f  commit4"
git add hello.c
git commit -m "f  commit8"

(alternatively you might in add -p add everything but the include statement and then do commit --amend, but that approach might result in conflicts if other changes are near, so I recommend the above approach instead)

Now in the step two (git rebase --interactive commit4^) you re-arrange

pick 100004 commit4
pick 100005 commit5
pick 100006 commit6
pick 100007 commit7
pick 100008 commit8
pick 200001 f  commit4
pick 200002 f  commit8
pick 100009 commit9
pick 100010 commit10
pick 100011 commit11
pick 100012 commit12

to

pick 100004 commit4
f 200001 f  commit4
pick 100005 commit5
pick 100006 commit6
pick 100007 commit7
pick 100008 commit8
f 200002 f  commit8
pick 100009 commit9
pick 100010 commit10
pick 100011 commit11
pick 100012 commit12

3 In this particular case the interactive rebase head. In the normal case it will modify the head of the current branch.

4 If some of the files have been renamed you might want to add a --follow argument to log. One important caveat with --follow to be aware of is that git operates primarily on content and not files as such, so git will consider all empty files identical so you might end up with some false positives (or similarly for the less likely case of having multiple independent files with identical content).

5 If the part of history you are going to modify contains merges you should add --rebase-merges.

hlovdal
  • 26,565
  • 10
  • 94
  • 165