61

I have a fairly large Git repository with 1000s of commits, originally imported from SVN. Before I make my repo public, I'd like to clean up a few hundred commit messages that don't make sense in my new repo, as well as to remove all that git-svn informational text that got added.

I know that I can use 'git rebase -i' and then 'git commit --amend' to edit each individual commit message, but with hundreds of messages to be edited, that's a huge pain in the you-know-what.

Is there any faster way to edit all of these commit messages? Ideally I'd have every commit message listed in a single file where I could edit them all in one place.

Thanks!

Walt D
  • 4,491
  • 6
  • 33
  • 43

7 Answers7

61

That's an old question but as there is no mention of git filter-branch, I just add my two cents.

I recently had to mass-replace text in commit message, replacing a block of text by another without changing the rest of the commit messages. For instance, I had to replace Refs: #xxxxx with Refs: #22917.

I used git filter-branch like this

git filter-branch --msg-filter 'sed "s/Refs: #xxxxx/Refs: #22917/g"' master..my_branch
  • I used the option --msg-filter to edit only the commit message but you can use other filters to change files, edit full commit infos, etc.
  • I limited filter-branch by applying it only to the commits that were not in master (master..my_branch) but you can apply it on your whole branch by omitting the range of commits.

As suggested in the doc, try this on a copy of your branch. Hope that helps.


Sources used for the answer

Lex Lustor
  • 1,525
  • 2
  • 22
  • 27
  • 2
    --msg-filter option outputs commit message on STDOUT and takes new commit message on STDIN so you need to use "sed" instead of "sed -i":
    git filter-branch --msg-filter 'sed "s/Refs: #xxxxx/Refs: #22917/g"' master..my_branch 
    – Simon Desfarges Apr 03 '17 at 15:50
  • 2
    @SimonDesfarges You're absolutely right ! It doesn't work, I update my answer right away – Lex Lustor Apr 04 '17 at 11:57
  • 1
    If you do this in a copy of your branch as @LexLustor suggests, you might have a stray reference left over after the `git filter-branch` operation. If so, you can do this to get rid of the leftover reference: `git update-ref -d refs/original/refs/heads/your-temp-branch-name` – Christian Long Jul 24 '18 at 03:59
19

git-filter-repo https://github.com/newren/git-filter-repo is now recommend. I used it like:

PS C:\repository> git filter-repo --commit-callback '
>> msg = commit.message.decode(\"utf-8\")
>> newmsg = msg.replace(\"old string\", \"new string\")
>> commit.message = newmsg.encode(\"utf-8\")
>> ' --force
New history written in 328.30 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 087f91945a blah blah
Enumerating objects: 346091, done.
Counting objects: 100% (346091/346091), done.
Delta compression using up to 8 threads
Compressing objects: 100% (82068/82068), done.
Writing objects: 100% (346091/346091), done.
Total 346091 (delta 259364), reused 346030 (delta 259303), pack-reused 0
Completely finished after 443.37 seconds.
PS C:\repository>

you probably don't want to copy the powershell extra things, so here is just the command:

git filter-repo --commit-callback '
msg = commit.message.decode(\"utf-8\")
newmsg = msg.replace(\"old string\", \"new string\")
commit.message = newmsg.encode(\"utf-8\")
' --force

If you want to hit all the branches don't use --refs HEAD. If you don't want to use --force you can run it on a clean git clone --no-checkout. This got me started: https://blog.kawzeg.com/2019/12/19/git-filter-repo.html

Stephen
  • 1,603
  • 16
  • 19
18

This is easy to do as follows:

  • Perform first import.
  • Export all commits into text:

    git format-patch -10000
    

    Number should be more than total commits. This will create lots of files named NNNNN-commit-description.patch.

  • Edit these files using some script. (Do not touch anything in them except for top with commit messages).
  • Copy or move edited files to empty git repo or branch.
  • Import all edited commits back:

    git am *.patch
    

This will work only with single branch, but it works very well.

mvp
  • 111,019
  • 13
  • 122
  • 148
  • Seems promising, thank you! Unfortunately I'm getting a "Bad file number" error when running 'git am *.patch'. I'm on Windows 7, and a quick Google search seems to suggest that it's related to exceeding the maximum number of command line arguments, which makes sense given that there's a patch file for each commit. I'll try it on my Mac a little later. – Walt D Jan 15 '13 at 07:21
  • You can do it in small batches, just make sure that sequence in increasing – mvp Jan 15 '13 at 07:30
  • `ls *.patch | xargs git am` automates that. – jthill Jan 15 '13 at 08:32
  • Sure, but on windows you would have install mingw or cygwin first. Much easier to start this on Linux or Mac. – mvp Jan 15 '13 at 08:34
  • xargs is one of the fundamentals, are you sure msysgit doesn't provide one? If that doesn't work there's always `git format-patch --stdout >whomping.big.patch` then edit and am that. Could you also detail how to graft the rest of the branches on there for him? – jthill Jan 15 '13 at 08:55
  • @jthill: of course mingw does provide xargs. but on vanilla windows box you would have install mingw first - that's what I said. As for other branches - this solution did not promise anything about that upfront, sorry... For branch splits it is easily doable in similar way. For merges - hmm, maybe so, but it is certainly not so trivial. – mvp Jan 15 '13 at 09:02
  • @mvp If you notice, I said msysgit not mingw. msysgit is the git-for-windows I know of; I believe a bare-bones subset of mingw comes with. – jthill Jan 15 '13 at 09:19
  • @jthill: ah, you're right. I have not used windows in ages, sorry for my ignorance :-) – mvp Jan 15 '13 at 09:21
  • @jthill: Thanks, xargs works great on Windows! Unfortunately now I'm getting an error that looks like "error: patch failed: path/to/source/file.cpp:629 error: path/to/source/file.cpp: patch does not apply". This happens to be the first commit on the first branch that will eventually get merged back into master. Does 'git am' not work with branches, even if the branches get merged back into master? – Walt D Jan 15 '13 at 17:35
  • @WaltD How many commits, please, and, is there any way to automate identifying commit messages that don't need updating? A search string or set of them? This is going to take some doing, I don't konw of anything built (yet) to automate this kind of wholesale revision, if anyone does, please be merciful and chime in... – jthill Jan 15 '13 at 21:19
  • 6
    use "git am --committer-date-is-author-date" to preserve the commit dates – oluc May 03 '13 at 08:53
  • I keep getting the problem `zsh: argument list too long: git` when I type `git am *.patch` IF I try to do the patches individually I get an error `error: file/name.txt does not not exist in index` the `file/name.txt` is different for each patch. – Whitecat Oct 04 '16 at 00:35
6

You can use git rebase -i and replace pick with reword (or just r). Then git rebasing stops on every commit giving you a chance to edit the message.

The only disadvantages are that you don't see all messages at once and that you can't go back when you spot an error.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • 2
    This is not the way to go. Opening 1000 commit messages one at a time takes about 8.5 hours to do all of them. calculating about 30 seconds a message – Whitecat Oct 01 '16 at 20:50
  • @Whitecat I misunderstood the question so that you want to do some manual cleanup, but you want something automated. Then what's the problem with `git filter-branch` mentioned in [another answer](http://stackoverflow.com/a/37941403/581205)? – maaartinus Oct 01 '16 at 22:35
  • I am having a problem with `git filter-patch` it is only showing patches and not the commits. I have over 6000 commit messages I want to change. But when I do `git filter-patch -100000 HEAD` I get only 500 patch files. – Whitecat Oct 01 '16 at 22:46
  • 1
    @Whitecat That's strange. I've just tried `git filter-branch --msg-filter 'echo "foo: " && cat'` and it rewrote nearly 3000 commits (all I have) in 20 seconds. IMHO perfect for what you need. If it doesn't work for you, add the details to the question - or better write another one. +++ I'm assuming, you mean filter-*branch*, not filter-*patch*. – maaartinus Oct 01 '16 at 22:59
  • What do you do if the changes have to be made with patterns. (i.e. change test that matches `issue-\d{1,4}` to `Task` – Whitecat Oct 04 '16 at 00:31
  • 1
    @Whitecat That's easy, use your favorite Unix tool, e.g., `git filter-branch --msg-filter 'perl -pe "s/issue-\\d{1,4}/Task/"'`. I'm not sure about needed escaping, so try to add or remove backslashes. If things get complicated, write a shell or perl script. – maaartinus Oct 04 '16 at 10:29
4

A great and simple way to do this would be to use git filter-branch --msg-filter "" with a python script.

The python script would look something like this:

import os
import sys
import re

pattern = re.compile("(?i)Issue-\d{1,4}")


commit_id = os.environ["GIT_COMMIT"]
message   = sys.stdin.read()

if len(message) > 0:

    if pattern.search(message):
        message = pattern_conn1.sub("Issue",message)

print message

The command line call you would make is git filter-branch -f --msg-filter "python /path/to/git-script.py"

Whitecat
  • 3,882
  • 7
  • 48
  • 78
1

I use a mix of these two solutions:

  1. Vscodium and the extension "Git rebase shortcut" to use git rebase -i the simple way

  2. Git history editor, which allows:

  • to bulk edit name + email
  • rewording Git commit messages in a fancy interface
  • change the commit date with a date picker

You paste your commits, then make your changes with the web interface, and it provides you a proper git filter-branch command to paste in your terminal.

A screenshot of the interface:

Git edit history

Good luck!

roneo.org
  • 291
  • 1
  • 7
0

As alternative, consider skipping the import of the whole repository. I would simply checkout, clean up and commit important points in the history.

Adam Dymitruk
  • 124,556
  • 26
  • 146
  • 141