How to split a large Git repository into multiple small independent repos without cloning the large repo

Question

I have a huge git repo which has multiple active feature branches and contains many folders inside it.Below is a structure of my Git.

BigRepo/
    .git
     F1
     F2
     F3

I want to split each of the above folders F1/F2/F3 as a separate individual git repos and retain all the live branches related to them and history. Below is what i expect:

F1/
  .git
F2/
  .git
F3/
  .git

I have already looked through below link:

[Detach (move) subdirectory into separate Git repository

I do not want to clone the big repo. Also, when the new repo is created, I want to see all the live branches and commit history

Is there a way this can be achieved by moving/copying the folders and creating a repo on the fly?

Appreciate the help.

You really need to clone the repo. It's the only way you have not to loose the commit history, refheads and so on — ErniBrown, Feb 15 '18 at 12:01
@ErniBrown There are 2 reasons I do not want to clone: 1. It creates a repo for only one branch. 2. We have many huge repos, and I want to get rid of the cloning overhead — B.T Anand, Feb 15 '18 at 12:07
@ErniBrown i was wondering what if i moved the .git folder in each of the subfolders to retain history....its not suggested..but wanted to know if it would work — B.T Anand, Feb 15 '18 at 12:11
What if you copy-paste your project 3 times and split your repo using `git filter-branch`(https://help.github.com/articles/splitting-a-subfolder-out-into-a-new-repository/#platform-linux) for each copied folder? — helenej, Feb 15 '18 at 12:56
Look at your .git directory size, Probably is at least half the full repository, so I don't see a great advantage not to clone everything... Another reason not to do so is that this way you will have 3 repos, all with a huge history, exactly the same until the split, and then 3 commits where you just removed the unused files... In the end you will have three big repos. Doing as suggested by helenej you will end up having 3 repos where the sum of the sizes is about the same as the old repo, which is for sure a better solution. — ErniBrown, Feb 15 '18 at 13:04
By the way, if your repo is too big maybe you have to rethink of its design, check if you included binaries, split it, use submodules, use LFS extension and so on. — ErniBrown, Feb 15 '18 at 13:04
@helenej Will it ensure that all the active branches exist post filter branch.? — B.T Anand, Feb 15 '18 at 13:15
@B.T Anand the drawback is that you'll have to do the `filter-branch`command for all branches but you may automate it with a script — helenej, Feb 15 '18 at 13:25

score 4 · Accepted Answer · answered Feb 15 '18 at 13:50

First things first:

Is there a way this can be achieved by moving/copying the folders and creating a repo on the fly?

No. git is not cvs. The structure of git fundamentally doesn't support what you're asking for. Commits - the objects that trace out the history - are made up of snapshots of the entire project. Nowhere is there an object or collection of objects that represents "exactly the history of the f1 subdirectory". There is enough information that you can produce such objects, but to do so you need a clone (or direct access to the origin).

So then to the comments, when you state reasons not to clone:

There are 2 reasons I do not want to clone: 1. It creates a repo for only one branch. 2. We have many huge repos, and I want to get rid of the cloning overhead

Reason 1 is incorrect. When you clone you copy all branches by default. Even if you've set up a configuration that doesn't default to copying everything, you still can copy everything.

Reason 2 is a nice thought, but you can't do the level of manipulation required to split the repo (including history) with anything less than a full copy of the repo. So if you can log onto the server and the repo is accessible on the filesystem, you can do the work there; but otherwise, you have to clone it. Do it once, split the repo up, and you'll never have to do it again.

Lastly

Also, when the new repo is created, I want to see all the live branches and commit history

After properly cloning the repo, you can use git filter-branch --subdirectory-filter f1 -- all to rewrite the history and produce your first "new" repo.

Then you clone that.

Then you go back and restore all the branches to their previous state using the backup refs stored under originals/*

Then you repeat for each other directory you want to break out.

jthill · Answer 2 · 2018-02-15T16:41:38.037

This is fairly easy to do efficiently. What you're doing is adding new commit references into parts of existing history.

The baseline for doing major refname surgery like this safely is a shared-objects clone, which uses the base repo's object db by reference. Doing heavyduty filter-branch works best on a tmpfs:

git clone -s --mirror . ${scratch=`mktemp -d`}
cd $scratch

The clone is essentially free:

$ time git clone -s --mirror . ${scratch=`mktemp -d`}
Cloning into bare repository '/tmp/jthill/tmp.MmqbLpe038'...
done.

real    0m0.073s
user    0m0.049s
sys     0m0.029s
$ du -sh $scratch
240K    /tmp/jthill/tmp.MmqbLpe038
$ git -C $scratch remote -v
origin  /home/jthill/src/linux/. (fetch)
origin  /home/jthill/src/linux/. (push)
$ git for-each-ref |wc
   2296    6888  148695
$ git -C $scratch for-each-ref |wc
   2296    6888  148695

so you can now do any refname surgery you want with complete freedom and push the results anywhere you want.

$ cd $scratch
$ git filter-branch --subdirectory-filter Documentation --tag-name-filter cat -- --all -- Documentation

extracted and hoisted the 34337 (of 835327) Documentation commits in about 15 minutes on my little box.

edit: rewriting the tags is not optimized, it's researching the complete history for each one, with thousands of tags and decades of history it's taking a few seconds per tag :-(

How to split a large Git repository into multiple small independent repos without cloning the large repo

2 Answers2