10

I want to make a rebase to remove a certain commit from my history. I know how to do that. However if I do it, the commit timestamp is set to the moment I completed the rebase. I want the commits to keep the timestamp.

I saw the last answer here: https://stackoverflow.com/a/19522951/3995351 , however it didn't work.

The last important command just showed a new line with

>

So I am opening a new question.

Community
  • 1
  • 1
TheWatcher
  • 167
  • 2
  • 11
  • 2
    "didn't work", "an answer" ... please link to the answer and maybe explain a little more in depth.. – Vogel612 Jun 11 '15 at 20:20
  • FWIW, the _author_ date should be preserved by default. It is only the _commit_ date that is changed. – David Deutsch Jun 11 '15 at 20:26
  • Did you also try other answers from that question?? give us more information – Vogel612 Jun 11 '15 at 20:27
  • @Vogel612 the problem is that the question was about the author date. What I want not to be changed is the commit date. The answer I am referring to is the only one fitting my needs – TheWatcher Jun 11 '15 at 20:31
  • @DavidDeutsch yeah and I want the commit date not to be changed – TheWatcher Jun 11 '15 at 20:31
  • Your problem (as you describe it) is solved in [this answer](http://stackoverflow.com/a/2976598/1803692) on that very question... did you try this in some sandbox repo? – Vogel612 Jun 11 '15 at 20:34
  • possible duplicate of [git rebase without changing commit timestamps](http://stackoverflow.com/questions/2973996/git-rebase-without-changing-commit-timestamps) – Vogel612 Jun 11 '15 at 20:34
  • @Vogel612 maybe a duplicate, but using the author date as the commit date is not a solution for me, I want the old commit dates to be kept – TheWatcher Jun 11 '15 at 20:36
  • "old commit dates" **are** the "author date"s, given you did not already do a rebase changing the commit timestamps, because you didn't use `--committer-date-is-author-date` – Vogel612 Jun 11 '15 at 20:37
  • @Vogel612 No, I have many many cherry picked things in the repository, so commit dates may be very different from author dates – TheWatcher Jun 11 '15 at 20:39
  • Check this article: http://axiac.ro/blog/2014/11/merging-git-repositories/ The part you need is explained in the sections about backing up and restoring the committer dates. The answer you are pointing at was a very helpful start but, unfortunately, the command explained there is incorrect (incomplete?) as you already found out. – axiac Jun 12 '15 at 06:14
  • @axiac just tried this, maybe I am too bad.. Not working for me :( Can you give me a step by step? :D I think I don't need all of the things done there – TheWatcher Jun 12 '15 at 18:37
  • I re-read your question now. Indeed, the procedure explained in that article does not work in your case. It worked for me because I didn't change the content of rebased commits. The situation is different when you want to remove a commit. The problem is the difficulty (or the impossibility) to find some piece of information that uniquely identifies each commit **before and after** the rebase. – axiac Jun 12 '15 at 19:35
  • @axiac maybe we can somehow automate the answer from David Deutsch. I really wouldn't like to have to do this one by one.. :D – TheWatcher Jun 12 '15 at 20:51
  • I wrote an answer that provides the entire workflow needed to remove a commit from the history and restore the original commit dates. It aims to make the procedure as safe as possible. I hope it can be applied without problems on your case. – axiac Jun 13 '15 at 14:46

3 Answers3

19

The setup

Let's say this is the history around the commit you want to remove

... o - o - o - o ...       ... o
        ^   ^   ^               ^
        |   |   +- next         |
        |   +- bad              +-- master (HEAD)
      start

where:

  • bad is the commit you want to remove;
  • start is the parent of the commit you want to remove;
  • next is the next commit after bad; it is good, you want to keep it and all the timeline after it; it will replace bad after rebase.

Prerequisites

In order to be able to safely remove bad, it's important that no other branch existing at the time when bad was created was merged into the main timeline after bad. I.e. by removing bad and its connections with its parent and child commits from the history graph, you get two disconnected timeline pieces.

It is probably possible to remove bad even if another existing branch was merged after bad. I didn't check this situation but I expect some impediments because of the merge commit.

The idea

Each git commit is identified by a hash that is computed using the commit's properties: content, message, author and committer date and email.

A rebase always changes the committer date. It can also change committer email, commit message and content too.

In order to restore the original committer dates after a rebase we need to save them together with some information that can identify each commit after the rebase.

Because you want to modify a commit, the commit contents change during the rebase. Adding or removing files or commits change the contents all future commits.

This leave us without a property that uniquely identifies the commits and does not change during the desired rebase. We can try to use two or more properties that do not change during the rebase.

The emails (author and committer) are of almost no use. If there is a single person that worked on the project, they are the same for all commits and cannot be used. The properties that remains (are different on most commits, are not affected by the rebase) are author date and commit message (the first line).

If the pair (author date, commit message) provides unique values for all the commits affected by the rebase then we can restore the commit dates afterwards without errors.

Verify if it can be done safely

There is a simple way to verify if the (author date, commit message) pairs are unique for the affected commits.

Run the following two commands:

$ git log --format="%aI %s" start...master | uniq | wc -l
$ git log --oneline start...master | wc -l

If they display the same number then you are lucky: the pair (author date, commit message) can be used to uniquely identify the commits. Read on.

If the numbers are different (the first command will always produce a number smaller than or equal to the one produced by the second command) then you are out of luck.

Extract the information needed to fix the commit dates after the rebase

This command

$ git log --format="%H %cI %aI %s" start...master > /tmp/hashlist

extracts the commit hash, committer date (the payload), author date and commit message (the key) for all the commits starting with start and stores them in a file.

Backup the current master

While it is a common misconception that git "rewrites history", in fact it just generates an alternative history line and decides it is the correct history. It does not change or remove the "rewritten" commits; they are still present for some time in its database and can be restored in case the operation fails.

We can proactively backup the current history line to easily restore it if needed. All we have to do is to create a new branch that points to master. This way, when git rebase moves master to the new timeline, the old one is still accessible using the new branch.

$ git branch old_master

The command above creates a branch named old_master that keeps the current timeline in focus until we complete all the changes and are satisfied with the new world order.

Do the rebase

Removing the commit bad from the history is as simple as:

$ git rebase --preserve-merges --onto start bad

Fix the commit dates

The following command "rewrites" the history and changes the committer date using the values we saved before:

$ git filter-branch --env-filter 'export GIT_COMMITTER_DATE=$(fgrep -m 1 "$(git log -1 --format="%aI %s" $GIT_COMMIT)" /tmp/hashlist | cut -d" " -f2)' -f start...master

How it works:

git walks the history between the commits labelled start and master and for each commit it runs the command provided as argument to --env-filter before rewriting the commit. It sets the environment variable GIT_COMMIT with the hash of the commit being rewritten.

Since we already did a rebase that modified the hashes of all the commits we cannot use $GIT_COMMIT directly to identify the original commit date of the commit (because $GIT_COMMIT is a commit generated by git rebase and we are not interested in their committer dates).

The command we provide to --env-filter

export GIT_COMMITTER_DATE=$(fgrep -m 1 "$(git log -1 --format="%aI %s" $GIT_COMMIT)" /tmp/hashlist | cut -d" " -f2)

runs git log -1 --format="%aI %s" $GIT_COMMIT to generate the key pair (author date, commit message) discussed above. Its output is passed as argument to the command fgrep -m 1 "..." /tmp/hashlist | cut -d" " -f2 that finds the pair in the list of previously saved hashes (fgrep) and extracts the original commit date from the saved line (cut). Finally, the value of the commit date is stored in the environment variable GIT_COMMITTER_DATE that is used by git to rewrite the commit.

Verification

Using the git log command again

$ git log --format="%cI %aI %s" start...master

you can verify that the rewritten history matches the original history. If you use a graphical git client you can check the results easier by visual inspection. The branch old_master keeps the old history line visible in the client and you can easily compare the dates of each commit of old_master branch with the corresponding one of master branch.

If something didn't go well or you need to modify the procedure you can easily start over by:

$ git reset --hard old_master

Cleanup

When you are satisfied by the result you can remove the backup branch and the file used to store the original commit dates:

$ git branch -D old_master
$ rm /tmp/hashlist

That's all!

axiac
  • 68,258
  • 9
  • 99
  • 134
  • Thank you very much! Looks promising :) will try that today. – TheWatcher Jun 14 '15 at 11:23
  • It gives me an error on the first command: git log --format "%aI %s" c318373c22214f91fb189619e093be298aa6c0e0...lollipop | uniq | wc -l fatal: ambiguous argument '%aI %s': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' – TheWatcher Jun 14 '15 at 13:43
  • An equal sign (`=`) was missing. The correct command is `git log --format="%aI %s" ...`. Updated the answer. – axiac Jun 14 '15 at 17:38
  • Hey, All worked, but ob the last real step it says: (1/219)fatal: invalid date format: %cI could not write rewritten commit – TheWatcher Jun 17 '15 at 19:16
  • Are you here? You know what could be the problem? – TheWatcher Jun 20 '15 at 15:15
  • What is the exact error message and what command generates it? Your previous comment is ambiguous: `%cI` is used with `git log` but the part *"could not write rewritten commit"* looks like an error produced by `git filter-branch` (and that command that does not use `%cI`). – axiac Jun 20 '15 at 16:00
  • Great, thank you very much! One small thing I just stumbled upon: It seems you should always use `sort` before using `uniq`. Minimal example: `printf "hello\nworld\nhello\nworld" | uniq` returns four lines while `printf "hello\nworld\nhello\nworld" | sort | uniq` returns two (as expected). Or directly use `sort -u` (or `sort --unique`) which makes piping to `uniq` obsolete. – msa May 16 '19 at 16:42
  • I don't know why, but I keep getting `grep: ./commit-list: No such file or directory`, my hash list is named `commit-list` and located in the same directory (I originally placed it in the parent directory, but `../commit-list` had the same problem). – Zarepheth Dec 20 '19 at 00:03
  • 1
    Eventually I replaced the relative path with the absolute from the root `/c/.../commit-list` and it worked. :p – Zarepheth Dec 20 '19 at 00:06
  • I just tried this when rebasing on top of another branch with a merge that occurred after my starting commit. The `git filter-branch` command reset all the commit dates between my starting point and the merge to the current date and time -- perhaps because I didn't include that branch in the `hashlist` file. – Zarepheth Jan 07 '20 at 21:46
6

So, here is a tedious way to do it (depending on how many commits you need to rebase), but I tried it out and it works. When you do an interactive rebase, mark each commit with "e" so that you can edit it. This will cause git to pause after every commit. At each pause, you can specify which date to use and continue to the next commit with:

GIT_COMMITTER_DATE="Wed Feb 16 14:00 2011 +0100" git commit --amend   
git rebase --continue

Or if you want to keep the committer and author date the same:

GIT_COMMITTER_DATE="Wed Feb 16 14:00 2011 +0100" git commit --amend --date "Wed Feb 16 14:00 2011 +0100"
git rebase --continue

Add --no-edit after --amend, if you don't want the editor to open to change the commit.

This is, of course, a major pain in the rear, and you have to know all of the commit dates before hand, but if you can't do it any other way, it at least should work.

devpelux
  • 2,492
  • 3
  • 18
  • 38
David Deutsch
  • 17,443
  • 4
  • 47
  • 54
  • Hmm I have like 300 commits... So it should be very difficult to do this hand by hand but maybe we can automate this command set? – TheWatcher Jun 11 '15 at 20:50
  • Wow, 300 is definitely way too many for my method, unless you have a lot of time on your hands. The only other thing I can suggest is to take a look at the **last** answer to [this question](http://stackoverflow.com/a/19522951/4880675). – David Deutsch Jun 11 '15 at 20:58
  • Ah, sorry about that. – David Deutsch Jun 11 '15 at 21:01
  • maybe you could help me to resolve the issue eith that answer that fails for me.. – TheWatcher Jun 11 '15 at 21:47
2

The real answer comes from Reddit, of all places:

git -c rebase.instructionFormat='%s%nexec GIT_COMMITTER_DATE="%cD" git commit --amend --no-edit' rebase -i
DharmaTurtle
  • 6,858
  • 6
  • 38
  • 52