Another approach for this issue is to parallelize checkout (starting Git 2.32, Q2 2021).
As explained in this patch (still in progress):
This series adds parallel workers to the checkout machinery.
The cache entries are distributed among helper processes which are responsible for
reading, filtering and writing the blobs to the working tree.
This should benefit all commands that call unpack_trees()
or check_updates()
,
such as: checkout, clone, sparse-checkout, checkout-index
, etc.
Local:
Clone Checkout I Checkout II
Sequential 8.180 s ± 0.021 s 6.936 s ± 0.030 s 2.585 s ± 0.005 s
10 workers 3.406 s ± 0.187 s 2.164 s ± 0.033 s 1.050 s ± 0.021 s
Speedup 2.40 ± 0.13 3.21 ± 0.05 2.46 ± 0.05
Example, with Git 2.32 (Q2 2021), there are preparatory API changes for parallel checkout.
See commit ae22751, commit 30419e7, commit 584a0d1, commit 49cfd90, commit d052cc0 (23 Mar 2021) by Matheus Tavares (matheustavares
).
See commit f59d15b, commit 3e9e82c, commit 55b4ad0, commit 38e9584 (16 Dec 2020) by Jeff Hostetler (Jeff-Hostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit c47679d, 02 Apr 2021)
convert
: add [async_]convert_to_working_tree_ca()
variants
Signed-off-by: Jeff Hostetler
Signed-off-by: Matheus Tavares
Separate the attribute gathering from the actual conversion by adding _ca()
variants of the conversion functions.
These variants receive a precomputed 'struct conv_attrs
', not relying, thus, on an index state.
They will be used in a future patch adding parallel checkout support, for two reasons:
- We will already load the conversion attributes in
checkout_entry()
, before conversion, to decide whether a path is eligible for parallel checkout.
Therefore, it would be wasteful to load them again later, for the actual conversion.
- The parallel workers will be responsible for reading, converting and writing blobs to the working tree.
They won't have access to the main process' index state, so they cannot load the attributes.
Instead, they will receive the preloaded ones and call the _ca()
variant of the conversion functions.
Furthermore, the attributes machinery is optimized to handle paths in sequential order, so it's better to leave it for the main process, anyway.
And:
With Git 2.32 (Q2 2021), the checkout machinery has been taught to perform the actual write-out of the files in parallel when able.
See commit 68e66f2 (19 Apr 2021), and commit 1c4d6f4, commit 7531e4b, commit e9e8adf, commit 04155bd (18 Apr 2021) by Matheus Tavares (matheustavares
).
(Merged by Junio C Hamano -- gitster
-- in commit a1cac26, 30 Apr 2021)
Co-authored-by: Jeff Hostetler
Signed-off-by: Matheus Tavares
Make parallel checkout configurable by introducing two new settings: >- checkout.workers
and
checkout.thresholdForParallelism
.
The first defines the number of workers (where one means sequential checkout), and the second defines the minimum number of entries to attempt parallel checkout.
To decide the default value for checkout.workers, the parallel version was benchmarked during three operations in the linux repo, with cold cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II).
The four tables below show the mean run times and standard deviations for 5 runs in: a local file system on SSD, a local file system on HDD, a Linux NFS server, and Amazon EFS (all on Linux).
Each parallel checkout test was executed with the number of workers that brings the best overall results in that environment.
Local SSD:
Sequential 10 workers Speedup
Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03
Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03
Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03
Local HDD:
Sequential 10 workers Speedup
Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03
Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18
Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07
Linux NFS server (v4.1, on EBS, single availability zone):
Sequential 32 workers Speedup
Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13
Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07
Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08
EFS (v4.1, replicated over multiple availability zones):
Sequential 32 workers Speedup
Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07
Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03
Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03
The above benchmarks show that parallel checkout is most effective on repositories located on an SSD or over a distributed file system.
For local file systems on spinning disks, and/or older machines, the parallelism does not always bring a good performance.
For this reason, the default value for checkout.workers is one, a.k.a.
sequential checkout.
To decide the default value for checkout.thresholdForParallelism
, another benchmark was executed in the "Local SSD" setup, where parallel checkout showed to be beneficial.
This time, we compared the runtime of a git checkout -f
(man), with and without parallelism, after randomly removing an increasing number of files from the Linux working tree.
The "sequential fallback" column below corresponds to the executions where checkout.workers was 10 but checkout.thresholdForParallelism
was equal to the number of to-be-updated files plus one (so that we end up writing sequentially).
Each test case was sampled 15 times, and each sample had a randomly different set of files removed.
Here are the results:
sequential fallback 10 workers speedup
10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02
20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02
50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02
100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04
200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05
500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09
1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12
2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12
From the above numbers, 100 files seems to be a reasonable default value for the threshold setting.
Note: Up to 1000 files, we observe a drop in the execution time of the parallel code with an increase in the number of files.
This is a rather odd behavior, but it was observed in multiple repetitions.
Above 1000 files, the execution time increases according to the number of files, as one would expect.
About the test environments: Local SSD tests were executed on an i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux.
Local HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running Debian.
NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge instance, with 4 vCPUs.
The Linux NFS server was running on a m6g.large instance with 2 vCPUSs and a 1 TB EBS GP2 volume.
Before each timing, the linux repository was removed (or checked out back to its previous state), and sync && sysctl vm.drop_caches=3
was executed.
git config
now includes in its man page:
checkout.workers
The number of parallel workers to use when updating the working tree.
The default is one, i.e. sequential execution. If set to a value less
than one, Git will use as many workers as the number of logical cores
available. This setting and checkout.thresholdForParallelism
affect
all commands that perform checkout. E.g. checkout, clone, reset,
sparse-checkout, etc.
Note: parallel checkout usually delivers better performance for repositories
located on SSDs or over NFS. For repositories on spinning disks and/or machines
with a small number of cores, the default sequential checkout often performs
better. The size and compression level of a repository might also influence how
well the parallel version performs.
checkout.thresholdForParallelism
When running parallel checkout with a small number of files, the cost
of subprocess spawning and inter-process communication might outweigh
the parallelization gains.
This setting allows to define the minimum
number of files for which parallel checkout should be attempted.
The default is 100.
And, still with Git 2.32 (Q2 2021), the final part of "parallel checkout":
See commit 87094fc, commit d590422, commit 2fa3cba, commit 6a7bc9d, commit d0e5d35, commit 70b052b, commit 6053950, commit 9616882 (04 May 2021) by Matheus Tavares (matheustavares
).
(Merged by Junio C Hamano -- gitster
-- in commit a737e1f, 16 May 2021)
checkout-index
: add parallel checkout support
Signed-off-by: Matheus Tavares
Allow checkout-index to use the parallel checkout framework, honoring the checkout.workers configuration.
There are two code paths in checkout-index which call checkout_entry()
, and thus, can make use of parallel checkout:
checkout_file()
, which is used to write paths explicitly given at the command line; and
checkout_all()
, which is used to write all paths in the index, when the --all
option is given.
In both operation modes, checkout-index doesn't abort immediately on a checkout_entry()
failure.
Instead, it tries to check out all remaining paths before exiting with a non-zero exit code.
To keep this behavior when parallel checkout is being used, we must allow run_parallel_checkout()
to try writing the queued entries before we exit, even if we already got an error code from a previous checkout_entry()
call.
However, checkout_all()
doesn't return on errors, it calls exit()
with code 128. We could make it call run_parallel_checkout()
before exiting, but it makes the code easier to follow if we unify the exit path for both checkout-index modes at cmd_checkout_index()
, and let this function take care of the interactions with the parallel checkout API.
So let's do that.
With this change, we also have to consider whether we want to keep using 128 as the error code for git checkout-index --all
(man), while we use 1 for git checkout-index
(man) <path>
(even when the actual error is the same).
Since there is not much value in having code 128 only for --all
, and there is no mention about it in the docs (so it's unlikely that changing it will break any existing script), let's make both modes exit with code 1 on checkout_entry()
errors.
Before Git 2.33 (Q3 2021), the parallel checkout codepath did not initialize object ID field used to talk to the worker processes in a futureproof way.
See commit 3d20ed2 (17 May 2021) by Matheus Tavares (matheustavares
).
(Merged by Junio C Hamano -- gitster
-- in commit bb6a63a, 10 Jun 2021)
parallel-checkout
: send the new object_id
algo field to the workers
Signed-off-by: Matheus Tavares
An object_id
storing a SHA-1 name has some unused bytes at the end of the hash array.
Since these bytes are not used, they are usually not initialized to any value either.
However, at parallel_checkout.c:send_one_item()
the object_id
of a cache entry is copied into a buffer which is later sent to a checkout worker through a pipe write()
.
This makes Valgrind complain about passing uninitialized bytes to a syscall.
However, since cf09832 (hash
: add an algo member to struct object_id, 2021-04-26, Git v2.32.0-rc0 -- merge listed in batch #15) ("hash: add an algo member to struct object_id",
2021-04-26), using hashcpy()
is no longer sufficient here as it won't copy the new algo field from the object_id
.
Let's add and use a new function which meets both our requirements of copying all the important object_id
data while still avoiding the uninitialized bytes, by padding the end of the hash array in the destination object_id
.
With this change, we also no longer need the destination buffer from send_one_item()
to be initialized with zeros, so let's switch from xcalloc()
to xmalloc()
to make this clear.