Git submodule init async

Question

When I run git submodule update --init first time on a projects which have a lot of submodules, this usually take a lot of time, because most of submodules are stored on slow public servers.

Is there a possibility to initialize submodules asynchronously?

What do you mean with asynchronously? Would this do: `git submodule update --init &`? — rodrigo, Jun 26 '13 at 09:33
I mean a way when each submodule is initialized in separate process. — Leksat, Jun 26 '13 at 09:59
I've written a small nodejs progam to do just that: https://gist.github.com/djfm/10857700 — djfm, Apr 16 '14 at 12:14
With Git 2.8 (Q1 2016), you will be able to fetch submodules in parallel (!) with `git fetch --recurse-submodules -j2`. See "[How to speed up / parallelize downloads of git submodules using git clone --recursive?](http://stackoverflow.com/a/34762036/6309)" — VonC, Jan 13 '16 at 08:53
Possible duplicate of [How to speed up / parallelize downloads of git submodules using git clone --recursive?](https://stackoverflow.com/questions/26023395/how-to-speed-up-parallelize-downloads-of-git-submodules-using-git-clone-recu) — Ciro Santilli OurBigBook.com, Nov 15 '17 at 20:28

score 5 · Answer 1 · edited Mar 07 '15 at 12:09

5

Linux:

cat .gitmodules | grep -Po '".*"' | sed 's/.\(.\+\).$/\1/' | while sleep 0.1 && read line; do git submodule update --init $line & done

Mac:

cat .gitmodules | grep -o '".*"' | cut -d '"' -f 2 | while sleep 0.1 && read line; do git submodule update --init $line & done

edited Mar 07 '15 at 12:09

Leksat

2,923
1
27
26

answered Jun 26 '13 at 14:08

Karmazzin

531
3
15

This works for me! But for some reason it skips some submodules. I believe it's because of many background processes. So, in the end, I have to run git submodule update --init another one time. – Leksat Jun 26 '13 at 14:13
Use operator **wait** for ex: http://stackoverflow.com/questions/9258387/bash-ampersand-operator – Karmazzin Jun 26 '13 at 18:04
Previously there were errors `error: could not lock config file .git/config: File exists` because parallel processes were writing to config. Now I've added `sleep 0.1` to the `while` loop and it works just perfect. – Leksat Feb 26 '15 at 12:57
This works great! One small issue I have is that the shell exits before the clone is complete. To remedy this, I modified this to use xargs: `cat .gitmodules | grep -Po '".*"' | sed 's/.$.\+$.$/\1/' | while sleep 0.1 && read line; do echo $line; done | xargs -L1 -P4 git submodule update --init;` This prevents the shell from exiting until all the clones are completes, and limits the number of parallel processes to 4. – dmi_ Aug 17 '16 at 16:07

Ben Moss · Accepted Answer · 2019-12-06T16:03:07.193

5

As of Git 2.8 you can do this:

git submodule update --init --jobs 4

where 4 is the number of submodules to download in parallel.

edited Dec 06 '19 at 16:03

answered Jun 20 '16 at 19:43

Ben Moss

74
1
3

score 2 · Answer 3 · edited May 23 '17 at 12:25

2

Update January 2016:

With Git 2.8 (Q1 2016), you will be able to fetch submodules in parallel (!) with git fetch --recurse-submodules -j2.
See "How to speed up / parallelize downloads of git submodules using git clone --recursive?"

Original answer mid-2013

You could try:

to initialize first all submodules:

git submodule init

Then, the foreach syntax:

git submodule foreach git submodule update --recursive -- $path &

If the '&' applies to the all line (instead of just the 'git submodule update --recursive -- $path' part), then you could call a script which would make the update in the background.

git submodule foreach git_submodule_update

edited May 23 '17 at 12:25

Community

1
1

answered Jun 26 '13 at 13:14

VonC

1,262,500
529
4,410
5,250

1

OP says for the "first time" which means the the submodules haven't been initialized as yet to run the `foreach` command on. right? – Bleeding Fingers Jun 26 '13 at 13:59
You might want to add the recursive flag too – Bleeding Fingers Jun 26 '13 at 14:00
@hus787 Then try a `git submodule init first` (which isn't an expensive operation), and then the update '`foreach`' submodule. – VonC Jun 26 '13 at 14:00
@hus787 I have added the recursive option – VonC Jun 26 '13 at 14:19

score 1 · Answer 4 · edited Jun 20 '20 at 09:12

This can also be done in Python. In Python 3 (because we're in 2015...), we can use something like this:

#!/usr/bin/env python3

import os
import re
import subprocess
import sys
from functools import partial
from multiprocessing import Pool

def list_submodules(path):
    gitmodules = open(os.path.join(path, ".gitmodules"), 'r')
    matches = re.findall("path = ([\w\-_\/]+)", gitmodules.read())
    gitmodules.close()
    return matches


def update_submodule(name, path):
    cmd = ["git", "-C", path, "submodule", "update", "--init", name]
    return subprocess.call(cmd, shell=False)


if __name__ == '__main__':
    if len(sys.argv) != 2:
        sys.exit(2)
    root_path = sys.argv[1]

    p = Pool()
    p.map(partial(update_submodule, path=root_path), list_submodules(root_path))

This may be safer than the one-liner given by @Karmazzin (since that one just keeps spawning processes without any control on the number of processes spawned), still it follows the same logic: read .gitmodules, then spawn multiple processes running the proper git command, but here using a process pool (the maximum number of processes can be set too). The path to the cloned repository needs to be passed as an argument. This was tested extensively on a repository with around 700 submodules.

Note that in the case of a submodule initialization, each process will try to write to .git/config, and locking issues may happen:

error: could not lock config file .git/config: File exists

Failed to register url for submodule path '...'

This can be caught with subprocess.check_output and a try/except subprocess.CalledProcessError block, which is cleaner than the sleep added to @Karmazzin's method. An updated method could look like:

def update_submodule(name, path):
    cmd = ["git", "-C", path, "submodule", "update", "--init", name]
    while True:
        try:
            subprocess.check_output(cmd, stderr=subprocess.PIPE, shell=False)
            return
        except subprocess.CalledProcessError as e:
            if b"could not lock config file .git/config: File exists" in e.stderr:
                continue
            else:
                raise e

With this, I managed to run the init/update of 700 submodules during a Travis build without the need to limit the size of the process pool. I often see a few locks caught that way (~3 max).

Git submodule init async

4 Answers4

Linked