2

I've set gc.pruneExpire = never in my repo configuration. When I run git remote update, I see this process stack:

git remote update
`-git fetch --multiple --all
  `-git maintenance run --auto --no-quiet
    `-git gc --auto --no-quiet
      `-git prune --expire never

... and git prune runs for several minutes. I'm wondering what it does, as pruning is effectively switched off in this repo.

GIT_TRACE2 log snippet:

16:06:15.068020 run-command.c:738                 child_start[3] git prune --expire never
16:06:15.069992 common-main.c:48                  version 2.33.0
16:06:15.070006 common-main.c:49                  start /usr/libexec/git/git prune --expire never
16:06:15.070180 git.c:456                         cmd_name prune (remote/fetch/maintenance/gc/prune)
16:07:10.501223 git.c:714                         exit elapsed:55.431555 code:0
16:07:10.501263 trace2/tr2_tgt_normal.c:123       atexit elapsed:55.431631 code:0
16:07:10.684134 run-command.c:993                 child_exit[3] pid:4658 code:0 elapsed:55.616096

You can see that it ran for >50s.

Update 2021-09-10: GIT_TRACE2_PERF output:

10:08:10.599900 common-main.c:48             | d3 | main                     | version      |     |           |           |              | 2.33.0
10:08:10.599927 common-main.c:49             | d3 | main                     | start        |     |  0.000411 |           |              | /usr/libexec/git/git gc --auto
 --no-quiet
10:08:10.600125 git.c:456                    | d3 | main                     | cmd_name     |     |           |           |              | gc (remote/fetch/maintenance/g
c)
...
10:08:31.766606 run-command.c:738            | d3 | main                     | child_start  |     | 21.167084 |           |              | [ch3] class:? argv:[git prune 
--expire never]
10:08:31.768137 common-main.c:48             | d4 | main                     | version      |     |           |           |              | 2.33.0
10:08:31.768154 common-main.c:49             | d4 | main                     | start        |     |  0.000395 |           |              | /usr/libexec/git/git prune --e
xpire never
10:08:31.768353 git.c:456                    | d4 | main                     | cmd_name     |     |           |           |              | prune (remote/fetch/maintenanc
e/gc/prune)
10:08:31.768473 progress.c:268               | d4 | main                     | region_enter | r0  |  0.000714 |           | progress     | label:Checking connectivity
10:08:31.768488 read-cache.c:2370            | d4 | main                     | region_enter | r0  |  0.000732 |           | index        | ..label:do_read_index ./index
10:08:31.768503 read-cache.c:2375            | d4 | main                     | region_leave | r0  |  0.000747 |  0.000015 | index        | ..label:do_read_index ./index
10:09:26.402642 progress.c:328               | d4 | main                     | data         | r0  | 54.634883 | 54.634169 | progress     | ..total_objects:0
10:09:26.402710 progress.c:336               | d4 | main                     | region_leave | r0  | 54.634953 | 54.634239 | progress     | label:Checking connectivity
10:09:26.578281 git.c:714                    | d4 | main                     | exit         |     | 54.810499 |           |              | code:0
10:09:26.578331 trace2/tr2_tgt_perf.c:213    | d4 | main                     | atexit       |     | 54.810558 |           |              | code:0
10:09:26.705890 run-command.c:993            | d3 | main                     | child_exit   |     | 76.106345 | 54.939261 |              | [ch3] pid:12022 code:0

We see that it takes 54 until git prune prints "total objects: 0". No insight why git gc started git prune in the first place.

uncleremus
  • 317
  • 1
  • 11
  • 2
    `git prune` is looking for unreachable objects to prune (provided they meet the time given). This is inherently slow, although *several minutes* indicates a very large repo and/or a very busy machine. It's not clear to me why `git gc` is running `git prune` at all here though, giving it an explicit "never". It would also be possible to make `git prune` handle `never` as a special case, but I'm not sure I like that idea as much as having `git gc` be smarter about *not running it* in the first place. :-) – torek Aug 27 '21 at 08:41
  • 1
    It's just a Linux kernel repo with a couple of remotes including `-stable` and `-next`. – uncleremus Aug 27 '21 at 15:39
  • "Just" a Linux kernel? While massive players like Microsoft laugh at the tiny Linux-kernel repository size, that's actually quite large, as Git repositories go. – torek Aug 27 '21 at 15:58
  • @torek And laugh they are: https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/ – VonC Aug 27 '21 at 15:59

1 Answers1

1

Try and activate GIT_TRACE2=1 before launching the git remote update command, to see more about the sequence of commands. (You have also GIT_TRACE2_PERF or GIT_TRACE2_EVENT)

git maintenance (that I detail here) means at least Git 2.30: see if the issue persists with the latest 2.33.0.2.


With Git 2.34 (Q4 2021), "git maintenance"(man) scheduler learned to use systemd timers as a possible backend.

That will allow a better control on when that kind of process (git gc before, git maintenance now) is taking place.

See commit b681b19, commit eba1ba9, commit cb7db5b (04 Sep 2021) by Lénaïc Huard (L3n41c).
(Merged by Junio C Hamano -- gitster -- in commit ed8794e, 20 Sep 2021)

maintenance: add support for systemd timers on Linux

Signed-off-by: Lénaïc Huard
Acked-by: Derrick Stolee

The existing mechanism for scheduling background maintenance is done through cron.
On Linux systems managed by systemd, systemd provides an alternative to schedule recurring tasks: systemd timers.

The main motivations to implement systemd timers in addition to cron are: * cron is optional and Linux systems running systemd might not have it installed. * The execution of crontab -l can tell us if cron is installed but not if the daemon is actually running. * With systemd, each service is run in its own cgroup and its logs are tagged by the service inside journald.
With cron, all scheduled tasks are running in the cron daemon cgroup and all the logs of the user-scheduled tasks are pretended to belong to the system cron service.
Concretely, a user that doesn't have access to the system logs won't have access to the log of their own tasks scheduled by cron whereas they will have access to the log of their own tasks scheduled by systemd timer.
Although cron attempts to send email, that email may go unseen by the user because these days, local mailboxes are not heavily used anymore.

In order to schedule git maintenance(man), we need two unit template files: * ~/.config/systemd/user/git-maintenance@.service to define the command to be started by systemd and * ~/.config/systemd/user/git-maintenance@.timer to define the schedule at which the command should be run.

Those units are templates that are parameterized by the frequency.

Based on those templates, 3 timers are started: * git-maintenance@hourly.timer * git-maintenance@daily.timer * git-maintenance@weekly.timer

The command launched by those three timers are the same as with the other scheduling methods:

/path/to/git for-each-repo(man) --exec-path=/path/to --config=maintenance.repo maintenance run --schedule=%i

with the full path for git to ensure that the version of git launched for the scheduled maintenance is the same as the one used to run maintenance start.

The timer unit contains Persistent=true so that, if the computer is powered down when a maintenance task should run, the task will be run when the computer is back powered on.

git maintenance now includes in its man page:

--scheduler=auto|crontab|systemd-timer|launchctl|schtasks

git maintenance now includes in its man page:

Possible values for <scheduler> are auto, crontab (POSIX), systemd-timer (Linux), launchctl (macOS), and schtasks (Windows).

When auto is specified, the appropriate platform-specific scheduler is used; on Linux, systemd-timer is used if available, otherwise crontab. Default is auto.

git maintenance now includes in its man page:

BACKGROUND MAINTENANCE ON LINUX SYSTEMD SYSTEMS

While Linux supports cron, depending on the distribution, cron may be an optional package not necessarily installed. On modern Linux distributions, systemd timers are superseding it.

If user systemd timers are available, they will be used as a replacement of cron.

In this case, git maintenance start will create user systemd timer units and start the timers. The current list of user-scheduled tasks can be found by running systemctl --user list-timers. The timers written by git maintenance start are similar to this:

$ systemctl --user list-timers
NEXT                         LEFT          LAST                         PASSED     UNIT                         ACTIVATES
Thu 2021-04-29 19:00:00 CEST 42min left    Thu 2021-04-29 18:00:11 CEST 17min ago  git-maintenance@hourly.timer git-maintenance@hourly.service
Fri 2021-04-30 00:00:00 CEST 5h 42min left Thu 2021-04-29 00:00:11 CEST 18h ago    git-maintenance@daily.timer  git-maintenance@daily.service
Mon 2021-05-03 00:00:00 CEST 3 days left   Mon 2021-04-26 00:00:11 CEST 3 days ago git-maintenance@weekly.timer git-maintenance@weekly.service

One timer is registered for each --schedule=<frequency> option.

The definition of the systemd units can be inspected in the following files:

~/.config/systemd/user/git-maintenance@.timer
~/.config/systemd/user/git-maintenance@.service
~/.config/systemd/user/timers.target.wants/git-maintenance@hourly.timer
~/.config/systemd/user/timers.target.wants/git-maintenance@daily.timer
~/.config/systemd/user/timers.target.wants/git-maintenance@weekly.timer

git maintenance start will overwrite these files and start the timer again with systemctl --user, so any customization should be done by creating a drop-in file, i.e. a .conf suffixed file in the ~/.config/systemd/user/git-maintenance@.service.d directory.

git maintenance stop will stop the user systemd timers and delete the above mentioned files.

For more details, see systemd.timer(5).

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • I'm currently at 2.32.0 (openSUSE Tumbleweed). – uncleremus Aug 27 '21 at 15:40
  • @uncleremus OK. I would be curious to know if this persists with 2.33. – VonC Aug 27 '21 at 15:41
  • Yes, it still happens with 2.33. I have a `GIT_TRACE2=1` log. Adding a snippet to the question above, but it doesn't provide much insight. – uncleremus Sep 01 '21 at 14:13
  • @uncleremus Not much to see indeed. Would `GIT_TRACE2_PERF` or `GIT_TRACE2_EVENT` yield anything more? – VonC Sep 01 '21 at 15:17
  • The systemd timers suggestion wouldn't help me. Indeed I want `git gc` to run synchronously on the repository at hand. Asynchronous maintenance would just be a workaround for the fact that `gc` / `prune` is so slow. – uncleremus Dec 11 '21 at 17:07
  • @uncleremus Yet said asynchronous maintenance can be set for one repository. – VonC Dec 11 '21 at 21:13