How to reduce the memory footprint for multiple submodules of the same source?

Question

I have a big project containing multiple Git submodules. A main project uses a collection of different libraries and some libraries use libraries themself. All libraries are useable in a standonlone fashion, so they all need a submodule containing the test infrastructure (VUnit and UVVM).

The tree of Git submodules looks like this:

ProjectA
 o- libA
     o- UVVM
     o- VUnit
 o- libB
     o- UVVM
     o- VUnit
 o- libC
     o- libA
         o- UVVM
         o- VUnit
     o- UVVM
     o- VUnit
 o- libD
     o- UVVM
     o- VUnit
 o- UVVM
 o- VUnit

I have some knowledge of Git's internal database and linking structure. So the BLOBs of the submodules are stored in the .git directory of the main repository in a directory called modules. They have usually the same symbolic name as the submodules directory name.

A submodule points to its database main directory with a .git file containing the relative path. In return the submodules database's config file points to the submodule's working tree.

So it would be possible that all UVVM submodules point to the same database, but how is it possible that a database points to multiple working trees?

I found the Git extension to work with multiple working dirs, but does it also work with submodules like in my case?

I'm also open for other suggestions.

Edit 1:

This is the generated internal structure in the .git directory. It creates on full object storage per submodule for UVVM and VUnit over and over again.

.git/
  modules/
    libA/
      modules/
        UVVM/
        VUnit/
    libB/
      modules/
        UVVM/
        VUnit/
    libC/
      modules/
        libA/
          modules/
            UVVM/
            VUnit/
        UVVM/
        VUnit/
    libD/
      modules/
        UVVM/
        VUnit/
    UVVM/
    VUnit/

The memory footprint on the server is very low, because all submodules point to the same repository. But the memory footprint on the client side is very high.

Would it suffice to use hard links or deduplication on file system level? — C-Otto, Jun 16 '17 at 09:17
I'm not sure how Git does file operations. E.g. with SVN it did delete and create operations instead of inplace content replacements. Such an operation destroys hardlinks. I'm also looking for a solution that works with `git clone` for new users. We are working in a group of 8 developers, some on Windows some on Linux. (btw. Windows does support hardlinks in NTFS :) ) — Paebbels, Jun 16 '17 at 09:19
I think it would help to specify clearly that you're looking for a solution that helps users of the repository, I assumed you were only looking for a server solution (where bare repositories on a proper filesystem would be appropriate). As far as I know, git uses big blob files which rarely change. However, you'd still have duplication for the checked out files. Re-creating hardlinks frequently might be necessary, maybe this is not the way to go. — C-Otto, Jun 16 '17 at 09:25
I'm looking for a solution on the user side. The server side has no duplications, because the "duplicated" code is in submodules. — Paebbels, Jun 16 '17 at 10:11

score 1 · Answer 1 · answered Jun 21 '17 at 03:18

1

If take .git/modules/libA/modules/UVVM as the only one source repository:

Delete ProjectA/libB/UVVM (working tree)
Delete ProjectA/.git/modules/libB/modules/UVVM (source repository)
cd ProjectA/libA/UVVM
Create a LibB branch in ProjectA/libA/UVVM submodule(repository)
- it's for ProjectA/libB/UVVM.
run git worktree add ../../libB/UVVM LibB

Now, the working tree LibA/UVVM and LibB/UVVM share the same source repository .git/modules/libA/modules/UVVM.

Do the same things to the rest UVVM repeatedly, and similar thing to the VUnit.

answered Jun 21 '17 at 03:18

Yue Lin Ho

2,945
26
36

How does this solution work on multiple computers? How does it replicate the created setup to other workstations / to other developers e.g. on a `git clone` operation? – Paebbels Jun 21 '17 at 06:36
Every cloned repository needs this process again. – Yue Lin Ho Jun 21 '17 at 09:40

How to reduce the memory footprint for multiple submodules of the same source?

1 Answers1