Automate transformation (via script) during rebase of 100 git commits

Question

I've got a git branch B that's about 100 commits ahead of its ancestor A. For each of those 100 commits, I'd like to run a script that:

Runs an existing script edit.sh that makes about a thousand automated search-and-replace edits to one source file.
Appends the SHA of the original commit to the commit message of the rebased commit.

Other than those two automated steps, I don't want to make any changes during the rebase.

I could do this manually via git rebase -i running the script and rewording the commit message 100 times, but what's a good way to automate the whole thing? (Without manual commands or editing for each of those 100 commits.)

I do need to keep all 100 commits in the history, though, because my branch isn't a feature branch. Instead, our repo is a fork of another repo and my branch is catching our fork up to upstream. It's very helpful to be able to show each upstream commit in the history of my fork, even if the content is a little different. But first I need to modify those commits to match the changes in our fork!

Note that we want to keep a linear history in our forked repo, so we're rebasing instead of using merge commits. So far we've been doing this via git rebase --onto which has been working pretty well for all files except one file that's had so many changes in our fork that handling merge conflicts during its rebase is a huge pain.

I just merged a PR in the upstream repo that will make the problematic file much more similar to our fork. The changes in that PR were automated via this script. I verified that rebasing that file just got a lot easier. But in the meantime I've got ~100 commits to port (aka git rebase --onto) from upstream=>fork until fork catches up with that PR in upstream.

So what I'm trying to do here is to create a local branch with 100 new commits that are the same as upstream's last 100 commits, except in each commit I rewrite the problematic file in those commits using the same script I used to create the upstream PR. Then I can do an interactive rebase against that temporary local branch, and have a much easier time handling the merge conflicts.

More info (don't need to read this)

BTW, here's more context about why we're doing this:

I'm one of the maintainers of two open-source repos: a JavaScript repo tc39/proposal-temporal and a TypeScript port of that repo: js-temporal/temporal-polyfill.

The JS repo is the "main" repo for our team: changes originate there, and are subsequently ported over to the TS repo. Ideally it could be done in the reverse order so we could just automate the TS=>JS step, but the repo is a proposal for making a change to JavaScript itself. The ECMA TC39 standards committee that owns the JavaScript spec would frown upon the reference polyfill for a JavaScript feature being written in TypeScript. :-) So we're stuck maintaining two polyfills: a JavaScript one for the standards committee, and a TypeScript one for production use.

For the most part this process works OK: the TS repo has the JS repo set up as a remote, and we fetch commits from the JS repo and use git rebase --onto to replay commits onto the TS repo. (We actually use some cool tools written by one of the maintainers to make this rebasing easier.)

If we can get through porting these 100 commits, then maintaining this fork will be much easier in the future!

Only the tip of the branch really matters in this context; why do you need to do this? I'd personally just squash everything into one commit and run the script once. — Makoto, Apr 18 '23 at 19:27
[`git rebase --exec`](https://git-scm.com/docs/git-rebase#Documentation/git-rebase.txt--xltcmdgt), [`git filter-repo --commit-callback`](https://github.com/newren/git-filter-repo). — phd, Apr 18 '23 at 19:33
@Makoto - I do need to keep all 100 commits because it's not a feature branch. I added a paragraph to my question explaining why. — Justin Grant, Apr 18 '23 at 19:46
If you want to keep the original history in your history, then you do *not* want to do a rebase, because it throws away the original history. Can you not do merges? — j6t, Apr 18 '23 at 19:48
In step 1 you make some in-place edtis. Do you then commit them? And that becomes the new commit? And in that commit’s message you reference the old commit message (step 2). For each commit you replace it with a new commit? — Guildenstern, Apr 18 '23 at 20:13
@j6t We want to keep a linear history in our forked repo. So far we've been doing this via `git rebase --onto` which has been working pretty well for our use case except this one problematic file. I added more context to my question to explain why we're doing this. — Justin Grant, Apr 18 '23 at 20:15
@Guildenstern Yes, that's the goal: map each old commit to a new commit that's the same as the old one except a few changes made via script. The commit message of the new commit should be the same as the old commit message, except append the SHA of the old commit to the end. — Justin Grant, Apr 18 '23 at 20:19
“I could do this manually via `git rebase -i`”—It’s called `--interactive` but you can still script it (like `exec` and the editor itself). I don’t see why there wouldn’t be a way to reduce this to one `rebase --interactive` command. But your use-case is so advanced so you probably already know all this. ;) — Guildenstern, Apr 18 '23 at 20:28
@Guildenstern - My assumption was that the exec command is run after each commit rather than immediately before each commit, which is I think what I want. Is this assumption correct? I think the behavior we want is to act like each commit has a merge conflict, so the rebase stops. Then we run a script that: changes one file, then `git add`s that file; runs `git rebase --continue`; and then I assume we'd need a separate script as a fake editor to append the SHA to the commit message. Then repeat until the rebase is complete. — Justin Grant, Apr 18 '23 at 20:53
The way I use interactive rebase when I’m amending commits (like you’re doing here?) is to stop on the commit and then do the fixups. I’m imagining that `exec` works the same: do something (like amend) at this stop. Then you can both change the contents of the commit and the commit message. But this is all just a theory (in my head). ;) — Guildenstern, Apr 18 '23 at 21:00

score 0 · Answer 1 · answered Jul 13 '23 at 18:53

A possible approach would be to create a script that automates the rebase process by leveraging the GIT_SEQUENCE_EDITOR environment variable (that I mentioned here). That variable allows us to change the rebase todo-list programmatically, which is the list of commits that will be applied during the rebase operation.

Here is an example of such a script:

#!/bin/bash
set -e

edit_script_path=/path/to/edit.sh

# That function will be called for each commit
process_commit() {
    sha="$1"
    message="$2"

    # Apply the edit.sh script
    chmod +x "$edit_script_path"
    "$edit_script_path"

    # Amend the commit
    git commit -a --amend -C HEAD --no-edit

    # Append the original SHA to the commit message
    git commit --amend -m "$message" -m "Original SHA: $sha"
}

export -f process_commit

# Kick off the rebase
git rebase --onto target_branch source_branch^ \
    -x 'bash -c "process_commit $GIT_COMMIT \"$(git log --format=%B -n 1 $GIT_COMMIT)\""'

That script defines a process_commit bash function which will be called for each commit during the rebase. That function runs your edit.sh script, amends the commit with these changes, and appends the original commit SHA to the commit message.

It starts the rebase operation using git rebase --onto target_branch source_branch^ command, which reapplies the commits from source_branch onto target_branch. The -x flag tells git to execute a shell command for each commit being reapplied.

The shell command is a call to bash -c "process_commit $GIT_COMMIT \"$(git log --format=%B -n 1 $GIT_COMMIT)\"". That command gets executed once per commit being reapplied. The GIT_COMMIT variable is provided by git and contains the SHA of the commit being reapplied. The git log --format=%B -n 1 $GIT_COMMIT command retrieves the original commit message.

Do replace /path/to/edit.sh with the actual path to your edit.sh script, and target_branch and source_branch^ with your actual branch names.

That script should automate the entire rebase process for you. However, please note that it may still halt if there are any merge conflicts between the branches.

Automate transformation (via script) during rebase of 100 git commits

More info (don't need to read this)

1 Answers1