5

We have a large c++ repository with size of 80 GB with nearly 200,000 files, containing multiple components.

The libraries (archives) are common for more number of components with tightly coupled.

With this all git operations and the compilation/building a particular component is taking too long time.

Please suggest me how to how to divide this single repo into multiple repos.

Useless
  • 64,155
  • 6
  • 88
  • 132
user2463892
  • 59
  • 1
  • 2

2 Answers2

4

First, 200000 source files are likely to take less than 80GB of space (unless each file represents a 400KB of source!)

Update 2015: git-lts can actually manage that kind of volume.
See "Efficient storage of binary files in a git repository".


Original answer (2013)

That means:

  • any generated binary needs to be excluded from the git repo
  • any large binary need to be stored elsewhere (either in a Nexus-like artifact repository, or in any other storage space, like with git-annex)

Second, git operations are only slow if we are talking about one huge repo.
git is done to manage multiple small repos (even the git Linux kernel repo is nowhere near the size and number of files you mention)

So you need:

  • to split the huge git repo around:

    • functional components (a component being a coherent group of file representing a major feature of your program: the GUI, a dispatcher, a launcher, anything that implements your program main functional blocks)
    • technical components (all those common technical libraries, reused by multiple other components, providing features not visible by the end users, only used by the developers)
  • speed up the compilation process, especially when doing unit or small integration tests, by using binary dependencies: instead of getting all the sources and recompiling everything, you could setup each project in order for them to use the binaries/exes produced the other projects in order for a specific project to compile and run.
    That depends on how tightly coupled your libraries are with the other components.


The OP user2463892 adds in the comments:

I heard some thing about GIT submodules which will helps in dividing or splitting the large code base.
I am not familiarized with this, Can any one help me understand few of my questions regarding this as below?

1) How git submodule works? will it divide the huge code into multiple repos? with this can we solve the problem of GIT slowness?

A submodule is a git repo declared within another repo (which becomes a "parent" repo).

The parent repo has a fixed know reference to a submodule repo as a special entry, which means:
when you clone a parent repo, you don't clone by default all the submodules declared in it

And that could be interesting in your case, as you don't need to clone all the sources in order to make the kind of incremental compilation you mention.
Plus, multiple repos means smaller repos, with commands like checkout, log, diff and status going faster.

2) Assume we divided the main repo into multiple repos by using this submodules... will this solve the problem which we faced (dependency between repos)?

Example: Assume we devide the main core repo into Super, RepoA, RepoB, RepoC etc...
Then will it be possible to compile all these repos together?
Can RepoA access the library from other repos (Super, RepoB, RepoC etc) and vice versa?

The mutual dependencies will still be there, but you would be able:

  • the checkout only the repos you need for a given step
  • store the compiled libraries outside of those repos, in order for repoB or repoC to use.

The goal is to switch from a source-only dependency to a (generated) binary dependency, where repoB can be compiled based on the binaries produced by repoA compilation step.

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks for your quick reply... Yes we have 2,00,000 source files (includes .cpp, .h, .lib_def, .exe_def, .tst_def, .sdl, .oml etc...). As you mentioned we tried to split the components by using below steps – user2463892 Jun 09 '13 at 13:13
  • 1) Precompiled all libraries in current repo 2) Moved the code related to functional components and used the precompiled libraries during compiling/building these functional components 3) Produced the pre-built from these components to use in the common code. – user2463892 Jun 09 '13 at 13:14
  • 4) But with this mechanism we have encountered some problems... As our code base is tightly coupled or have bi-directional dependencies (like main repo libraries are used in the splitted repos and the libraries from these splitted repos are used in the main repo). due to this during our Regression testing phase when ever there is a change in the common code - – user2463892 Jun 09 '13 at 13:15
  • first we need to build the pre-compiled libraries in the common code and by using these changed libraries we are compiling the splitted component libraries and then by using these libraries we are re-compiling the whole common code components to get a build for the Testing. 5) Which requires very much co-ordination and need to follow some sequesnce steps like first compile common gimp code then compiling the splitted components by determining the dependency and sequence. 6) Due to these problems we decided to move back these splitted components back to the main repo. – user2463892 Jun 09 '13 at 13:15
  • With this i have one question, i heared some thing about GIT submodules which will helps in dividing or splitting the large code base. I am not familiarised with this, Can any one help me understand few of my questions regarding this as below? 1) How git submodule works? will it devide the huge code into multiple repos? with this can we solve the problem of GIT slowness? 2) Assume we devided the main repo into multiple repos by using this submodules... will this solve the problem which we faced (dependency between repos)? – user2463892 Jun 09 '13 at 13:16
  • Example: Assume we devide the main core repo into Super, RepoA, RepoB, RepoC etc... Then will it be possible to compile all these repos together? Can RepoA access the library from other repos(Super, RepoB, RepoC etc) and vice versa? – user2463892 Jun 09 '13 at 13:16
  • @user2463892 I have edited the answer to address one of your points. And I would not advise subtree here. – VonC Jun 16 '13 at 07:28
1

You can create repositories for folders in Github using the following command.

git filter-branch --prune-empty --subdirectory-filter foldername master

This assumes you have already identified which components to extract and you sorted out the build processes once the repositories were created.

Reference:

Tim Randall
  • 4,040
  • 1
  • 17
  • 39
bloudraak
  • 5,902
  • 5
  • 37
  • 52