6

I'm rewriting the history of a fairly big repo using git filter-branch --tree-filter and it's taking a few hours. I see that git is using a temporary directory to store its intermediate work as it goes along. Does that mean it's possible to resume a rewrite if it gets interrupted? If so, how?

Edit

The operation I'm doing is moving a couple of directories. These are currently in subdirectories, but I now need them to be in the root.

e.g.

dir1
- dir2
- dir3
- dir4

becomes

dir1
- dir2
dir3
dir4

Of course my directory structure is a lot more complex than that, but that's the gist of what I'm trying to do.

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
alnorth29
  • 3,525
  • 2
  • 34
  • 50
  • 1
    Out of curiosity (not really core to your question), can you describe what operation you're doing with --tree-filter? Removing or modifying files? – Roberto Tyley Apr 23 '13 at 15:20

2 Answers2

14

git filter-branch doesn't itself support a suspend/resume pattern of use - although it writes temporary data out to a .git-rewrite folder, there's no actual support for resuming based on the contents of this directory. If you run git filter-branch on a repository that's had a previously aborted filter-branch operation, it'll either ask you to delete that temp folder, or, with the --force option, do it itself.

The underlying problem is that git-filter-branch is slow running on big repos - if the process was much faster, there'd be no motivation to attempt a resume. So you've got a few options:

Make git-filter-branch go a bit faster...

  • use a RAM-disk - git-filter-branch is very IO-intensive, and will run faster with your repository sitting in RAM.
  • use --index-filter rather than --tree-filter - it's similar to tree filter but doesn't check out the file-tree, which makes it faster, but does require you to rewrite your file alterations in terms of git index commands.
  • use cloud computing and hire a machine with fast ram and high clock-speed (don't bother with multiple cores unless your own commands are multi-threaded, as git-filter-branch itself is single-threaded)

...or use The BFG (way faster)

The BFG Repo-Cleaner is a simpler, faster alternative to git-filter-branch - on large repos it's 50-150x faster. That turns your job that takes several hours into one that takes just a few minutes.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
  • Thanks for the pointers. Running in a Linux VM with a RAM disk has significantly sped things up. I wasn't able to use BFG Repo-Cleaner as the operation I'm doing is moving a couple of directories so that they're in the root directory rather than a subdirectory. As far as I can tell this makes `--index-filter` tricky too as `git mv` doesn't work on index alone. – alnorth29 Apr 24 '13 at 09:40
  • Very glad that sped things up. Incidentally, moving/deleting directories is a feature I'm looking at adding to the BFG - I'll add a comment when that's complete. Thanks for providing the usage example! – Roberto Tyley Apr 24 '13 at 10:04
  • @alnorth29 apologies, a further question - what was the _justification_ for the subdirectory move? Why was it necessary? – Roberto Tyley May 01 '13 at 09:08
  • It was to get round a limitation with one of Microsoft's command line build tools. Visual Studio can compile ASP.NET web sites that are nested one inside the other, but the command line build tool cannot. We're setting up a continuous integration server and needed a working command line build solution. Thanks for the help, the rewrite's done and everything seems to be well. – alnorth29 May 01 '13 at 13:35
  • Thanks for replying @alnorth29 - I can see that would mean you'd need to rearrange directories. Just fixing the directories in your latest commit must have been an option... so I would guess going to the extra effort of changing your history as well was to enable you to ensure **old** builds passed on your CI server? – Roberto Tyley May 01 '13 at 15:07
  • 1
    Partly that, but we've also got 8 branches on the go that we need to be able to merge between. Doing the rearranging separately on each branch and then trying to merge from one to another would have been a nightmare. – alnorth29 May 02 '13 at 09:54
  • BFG is great! I used it to delete a folder and it performed at the "speed of light" when compared to git filter-branch. However, I still can't figure out how to use BFG in a use case like the one in the original question. My use case is exactly the one handled by this script: https://gist.github.com/emiller/6769886 The thing is the mentioned script took me 48 hours in one repo that is not my biggest one. I have an even bigger repository that I need to move the contents of dir2 to the root of the repository, retaining the history. Any ideas on how to achieve that using BFG? – jfoliveira Mar 11 '15 at 09:58
  • 1
    Another thanks for the BFG, here. Running git-filter-branch (with a naive --tree-filter) took nearly 3 weeks to get 97% of the way through our excessively large repo before the linux VM it was running on suffered file system corruption and it failed (!). Doing the same operation with the BFG took under 20 minutes, and also worked across all branches! – Dave Knight Nov 14 '17 at 09:05
7

Roberto mentioned this in his answer, but I want to give a benchmark for it: If your git filter-branch operation is taking to long to complete, consider an AWS high memory instance.

I once had to filter-branch and merge together 35 different repositories, each with two years of dozens-of-commits-per-day history. My script failed to complete in 25 hours on my laptop. It completed in 45 minutes on an m2.4xlarge instance in Amazon.

Total cost?

$1.64 -- less than I spend on a 20oz soda.

BFG sounds like a great tool and I'd encourage anyone who routinely rewrites history to try it out. But if you just need something to work and have easy access to AWS, filter-branch is trivially easy.

In 2016 this is even cheaper. Just mosey on over to the Spot Advisor and find yourself something of the "cluster compute for $0.30 / hour variety.

Christopher
  • 42,720
  • 11
  • 81
  • 99