2

For the first time, I tried to rewrite my git history using git filter-branch. I did this by writing a (Python) script (let's call it edit_file) that makes an edit to a file (let's call it target_file). Then, I ran this command:

git filter-branch --tree-filter "path/to/edit_file" HEAD

I got a nice stream of output which seemed to indicate that I'd gotten the effect that I wanted, but when viewed target_file, I did not see any changes. When I run edit_file directly, target_file in my working copy successfully receives the edits that I intended.

It sounds like my changes exist within some deep, dark, and dank recess in git's twisted mind, and I just need a magical incantation to summon forth my changes. I have no idea whether this is correct, nor do I understand where to begin looking, because all the material that I've read (including the official git book) indicates that once git filter-branch finishes, the branch that I am working on should have the changes that edit_file would perform on every version of target_file...

Halp?

Sorry if this is a bit long winded, but I don't know what details are necessary (because that is a core feature of being confused).


More details:

The reasons I say that it looked like filter-branch did what I wanted are:

  1. I could see the output of edit_file running on each commit, and it indicated success on all of them. The output of edit_file changes as it operates on different versions of target_file, and I was able to see the different output from edit_file as git filter-branch moved through history.

  2. At the end, I saw this:

    Ref 'refs/heads/my-branch' was rewritten
    

PS: Before doing git filter-branch, I ran

git checkout -b my-branch

to create a new branch named my-branch (and check it out), in case git filter-branch went horribly wrong.


After seeing that git filter-branch ... left target_file unchanged, I ran git checkout -b my-branch, but I guess that did nothing. I thought it might do something, because the last line from git filter-branch seems to be saying that the branch my-branch has been changed, but I honestly do not understand what that line means.

allyourcode
  • 21,871
  • 18
  • 78
  • 106
  • 1
    `git show` will tell you what this file looks like now in any given commit. – matt May 31 '20 at 03:35
  • @matt git show my_file produced zero output. I assume that that by adding more flags, one can achieve the behavior you mentioned, but it is rather non-obvious to me what those would be. Can you please post a complete working command? – allyourcode Jun 01 '20 at 22:14
  • Well, I said “in any given commit”. You didn’t give a commit. What commit do you choose to look in? – matt Jun 01 '20 at 23:29
  • What I expect that to do was show me every commit, because that would allow me to see "what this file looks like now in any commit". So, the answer to your question is basically, "yes". – allyourcode Jun 02 '20 at 00:02
  • Yes, but I said any _given_ commit. Perhaps we don't speak the same language here. What I'm telling you is: if you wish to know the state of your file in a particular commit, `git show` has the power to tell you. So if the question is: did I transform my file as desired? `git show` will answer that question. You can name a range of commits, but again, to tell you the actual command, I need to know the names of your commits at the endpoints of the range. – matt Jun 02 '20 at 00:39

2 Answers2

1

It sounds like my changes exist within some deep, dark, and dank recess in git's twisted mind, and I just need a magical incantation to summon forth my changes.

That is why you don't use git filter-branch anymore (it is obsolete, along with BFG)

You use git filter-repo with

allyourcode
  • 21,871
  • 18
  • 78
  • 106
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Ok. I will try to remember that for next time, but I already used filter-branch, so my repo is in some weird state that I cannot understand. Telling me I should have done something different doesn't really help me with that (unless someone can let me borrow their time machine). Is there anything that I need to undo before trying some other method of re-writing history? – allyourcode May 31 '20 at 00:54
  • @allyourcode you could reset to what is saved in refs/original, and try again: https://stackoverflow.com/a/27975288/6309 – VonC May 31 '20 at 01:00
  • Ah. I now see why I did not try filter-repo earlier: my git version is too low, and sudo apt-get install git --upgrade does nothing because I already have the latest available on my platform. Also, kind of annoying to have to install random third-party software. Rhetorical question: If filter-repo is so great, why doesn't it come with git? – allyourcode May 31 '20 at 02:13
  • Yes, trying to get filter-branch is leading to this: https://xkcd.com/349/ . Third-party solution = you're gonna have a bad time. – allyourcode May 31 '20 at 02:23
  • @matt Yes, as mentioned in my question I took precautions. It seems like they were sufficient, but I'm not sure how to tell, because git is confusing AF sometimes. Also, it looks like filter-branch already has a safety mechanism built in. – allyourcode May 31 '20 at 05:26
  • @allyourcode No problem: you can update Git easily enough. (https://stackoverflow.com/a/41357503/6309) – VonC May 31 '20 at 19:19
  • I tried that (before my previous reply). add-apt-repository failed for me. That is why I mentioned the xkcd strip. – allyourcode Jun 01 '20 at 11:03
  • @allyourcode What OS/distro are you using? With which version of Git? – VonC Jun 01 '20 at 12:09
  • Ok. I found a way to get a new version of git (don't ask). Now, git-filter-repo is telling me this: Aborting: Refusing to overwrite repo history since this does not look like a fresh clone. I'm not sure what this means, but it sounds bad. It is suggesting force, but like I said, I don't know what's going on, so I am rather afraid to use the --force. – allyourcode Jun 01 '20 at 12:45
  • Ok. I misunderstood what --blob-callback does (otoh, that is a pretty opaque name). I may be making progress now... self.head.bash(forest) – allyourcode Jun 01 '20 at 13:03
  • @allyourcode I renference more complete blob callback examples in https://stackoverflow.com/a/62123812/6309 – VonC Jun 01 '20 at 13:04
  • Thanks. I had already figured it out. So similar to my experience with git filter-branch, it appeared that I have finally found a magic incantation that would make git-filter-repo chooch, but all my commits after the last time that target_file change have been dropped, which seems utterly insane, since all I did was assign blob.data, not call any function with a dangerous sounding name like "we_never_really_needed_this_commit_anyway_so_lets_just_throw_it_into_a_black_hole_for_funzies". Fortunately, I was operating on a clone, so I can just nuke this abortion of an experiment. – allyourcode Jun 01 '20 at 14:04
  • Here is what I did to achieve the aforementioned wacky failure of inexplicably dropped commits (with some redactions): https://pastebin.com/HUp9sapC . Looks very much like what you posted in that other SO question, @VonC. btw, thanks for sticking with me. Please, know that my rage is NOT directed at you (but this is giving me genuine rage). – allyourcode Jun 01 '20 at 14:09
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215104/discussion-between-vonc-and-allyourcode). – VonC Jun 01 '20 at 14:11
0

So... It looks like git filter-branch did not do anything except leave weird back up poop in the .git dir. git gc does not clean that up for whatever reason (maybe there should be a git clean-poop command as well as gc?). Not sure what will, other than

cd ..  # Assuming you are at the root of your repo
git clone --no-local original fresh-copy
cd fresh-copy

Yes, even though we are making a local copy, --no-local is needed, because This. Is. GIIIIIIIIIT! This is probably what you should do before attempting filter-branch or filter-repo. Not sure why the documentation does not advise this, but anyway. Don't do what I did, and just skimp out by creating a new branch. Consider git filter-X to be a nuclear weapon. You don't just need bunker to protect you; you need disposable parallel universe.

After much gnashing of teeth, I was finally able to get VonC's suggestion of using git-filter-repo to work. If, like me, your platform does not have a new enough version of git to work with git-filter-repo (requires >= 2.22), you may be able to do something like

sudo add-apt-repository ppa:git-core/ppa
# followed by the usual
sudo apt-get update
# song and dance routine...
sudo apt-get install git --upgrade

as suggested on the git *nix download page to upgrade to the "latest and greatest". That did not work for me (so take care about blindly copying and pasting the above suggestion), but apparently, I have a very deranged system, so you may have better luck than I did. Anyway...

Once you obtain a new enough version of git, the only thing you should need is the git-filter-repo script itself (amazingly, it just consists of the one main file). So just download that straight from github, and stick it anywhere on your PATH. Remember to chmod +x the shit out of it first tho.

You will most likely NOT want to use the --path flag, because that will NOT just target the one file that you want to edit. Instead, --path will nuke all other files.

With that in mind, all you need to do is something like this:

git-filter-repo --blob-callback 'import sys
sys.path.append("dir/where/your/edit_file/py/file/lives")
import my_module

new = my_module.modify(blob.data.decode())
new_bytes = new.encode()
assert isinstance(new_bytes, bytes), ""
blob.data = new_bytes
'

Yes, that is all one command. blob.data contains the contents of whatever file git-filter-repo is examining. Furthermore, notice that it is a bytes object, not str. Let me re-emphasize a very important point: this operation goes over EVERY file (in every commit). So, your my_module.modify function better be very selective if you only intend to modify one file. (What git-filter-repo really needs is a way for your script to detect the path of blob, not just give you the contents of a file. But hopefully, you can recognize your file by its contents, not just its path.) If you mess up, it's not a big deal, because you can just nuke the fresh-copy dir and start over.

Refrain from gouging your eyes out after this ordeal. Your eyes are way too valuable to be destroyed over something as stupid as git. Feel free to have a good cry in the shower tho. Hey, at least you got it working finally, and you weren't eaten by sharks.

Oh, and another thing: the git clone you did in the first step does not copy over any submodules, because that would make sense and be too easy. Therefore, you must also do this in fresh-copy (even though git clone DOES copy over the .gitmodules file):

git submodule init
git submodule update

PS: You may get some therapeutic value from the one of the various git man page generators available on the Interwebs. They really do read like git man pages, even though they are literally randomized gibberish.

allyourcode
  • 21,871
  • 18
  • 78
  • 106
  • Thank you for the feedback. Not sure why it was downvoted. (beside maybe the "rant" tone of the post, which traditionally is frowned upon: I understand your frustration though) – VonC Jun 01 '20 at 15:04
  • Thanks. If ppl think git is their best friend and take offense that I am frustrated that it is incredibly unusable, then let them downvote my technically accurate, but slightly inflected answer. – allyourcode Jun 01 '20 at 15:10
  • I am glad you made it work in the end: that will help others! – VonC Jun 01 '20 at 15:12