2

Let's say I've written a zillion puppet modules and I have their entire history within a single git repo, but now I really wished I had made one repo per module. Divvying up the file structure is easy because each is wholly contained within their own directory and are organized like GIT_ROOT/modules/NAME.

Is there a way that I can divide this repo up and not loose the history for each module? Ideally each repo would only have history relevant to the module it represents. I tried cloning the entire thing and git rm -rf everything that's irrelevant but that retains irrelevant history.

I plan to glue them back together with git submodules, FWIW.

JFlo
  • 658
  • 6
  • 12
  • look at this answer https://stackoverflow.com/questions/38618885/error-rpc-failed-curl-transfer-closed-with-outstanding-read-data-remaining/62079208#62079208 – NikhilP Jun 03 '20 at 10:46

1 Answers1

6

I've been playing with this a little bit, and it looks like the easiest way to split this up is using git subtree split:

split

Extract a new, synthetic project history from the history of the subtree. The new history includes only the commits (including merges) that affected , and each of those commits now has the contents of at the root of the project instead of in a subdirectory. Thus, the newly created history is suitable for export as a separate git repository.

So, for example, if I start with the openstack-puppet-modules repository, which includes a bunch of individual puppet modules, I could split them up first into individual branches like this (I am only using eight modules here to keep things short):

for x in apache aviator ceilometer certmonger cinder common \
    concat firewall; do
  git subtree split -P $x -b split-$x
done

Once this is finished running, I have:

$ git branch | grep split-
split-apache
split-aviator
split-ceilometer
split-certmonger
split-cinder
split-common
split-concat
split-firewall
split-pacemaker

Each of those branches contains only the history for the specific directory. If I want to transform these into separate repositories, I could do this:

for x in apache aviator ceilometer certmonger cinder common \
    concat firewall; do
  git init --bare ../repos/repo-$x
  git push ../repos/repo-$x split-$x:master
done

Now I have a collection of repositories:

$ ls ../repos/
repo-apache  repo-aviator  repo-ceilometer  repo-certmonger
repo-cinder  repo-common  repo-concat  repo-firewall  work-cinder

And I think we've achieved your goal.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • Thank You! That looks perfect for the job. Bonus points for the excellent example. – JFlo Feb 06 '15 at 16:43
  • This has worked very well for the big breakup. Now I want to nuke each of those Puppet modules from the original Git repository so that I can bring them back in as Git submodules. I read up on http://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch and it would seem the following is perfect: `git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch modules/$NAME" HEAD`. Indeed that removes it from the worktree, but the per-module history remains? Is there a good way to nuke that too and retain the history for the leftovers? – JFlo Feb 08 '15 at 17:19