0

If you have a master (by convention) repo, and people use their own local repos, and someone abuses the "don't go back in time and rebase your changes instead of merging the branch in" or other (conventional) rules the git log isn't going to show you the full history of the remote repo - just the history as it pertains to the current state of the repo, right? Is there some way of seeing every last change that was made to the remote repo, to see who's not following the rules? I appreciate that the commits are gone and can't be gone back to (hence the problem) but just a text file containing the datas, SHAs etc would be handy. Does this exist? Can it be configured that way somehow?

To be clear; I'm interested in seeing everything which has been performed to the remote repo to track malicious or poorly trained usage of it.

  • You won’t be able to see that information if they exist within someone else’s remote repo. – evolutionxbox Jul 02 '18 at 10:27
  • I was wondering that. The remote repo is a shared one; the only one which should only be the result of pushes and pulls. Is there some sort of "post-any-git-operation" trigger which can just log every command issued to it to a text file on a server somewhere? –  Jul 02 '18 at 10:31
  • Are the developers working simultaneously in the same repo? – evolutionxbox Jul 02 '18 at 10:41
  • @evolutionxbox This is hypothetical but it's based on a true story, and yes, multiple developers in multiple sites all with full access to the remote repo. –  Jul 02 '18 at 10:42
  • You basically want an audit trail for the repo, is this a good way to phrase it? – alexis Jul 02 '18 at 10:47
  • @alexis Yes, because currently it seems the only option would be to take backups of the whole repo hourly/daily so you have some chance of catching this sort of thing. –  Jul 02 '18 at 10:49
  • The commits are not actually gone until a `gc` (garbage collection) run happens on the remote host. The old commits will still be present for some time, but just not referenced by the current branches. They might also still be present in your local clone if you'd done a fetch of the branch while those commits were still referenced by the branch. – Jonathan Wakely Jul 02 '18 at 11:02
  • @JonathanWakely I want something resistant to gc and not relying on luck or being there "just in time" etc. I want to know everything that's happened. –  Jul 02 '18 at 11:03
  • 1
    I don't think Git supports that natively, you'd have to add some custom logging via receive hooks on the remote master repo. Why not just [configure the repo to disallow pushes that changes history](https://stackoverflow.com/a/1754553/981959)? Then everybody has to follow the rules. – Jonathan Wakely Jul 02 '18 at 11:10
  • Exactly: I am not familiar with git's internals, but I would look into setting up some kind of logging (possibly to a remote server, if you have reason to fear tampering) through hooks. At least if your question is about how to track the full history *in the future*, rather than about doing it retrospectively now. – alexis Jul 02 '18 at 12:01
  • @alexis Exactly, yes. I'm learning git and I've heard from friends who have problems and I believe people are probably removing/rewriting history (possibly through ignorance of what they're doing and that it's wrong rather than through being malicious) and resulting in problems I can't diagnose and just thought some sort of log which is not affected by the very actions I had hoped the log would be able to help with would be useful. I shall look into hooking up some logging, thanks. –  Jul 02 '18 at 12:13
  • Jotted down some pointers as a starter answer. – alexis Jul 02 '18 at 13:07

2 Answers2

1

When you say a "master (by convention) repo", I assume you mean (1) there's a single "source of truth" repository for the project (which is conventionally called origin, not master); (2) users/devs clone the origin, work in their clones, and push changes to the origin.

In that usage, interactions with the origin boil down to very few - and in fact writes to the origin should always just be pushes. Things like rebase don't really occur at the origin; it is simply updated to see the result of those operations.

If you want any level of security or audit around the central repo, then this is the correct way to set it up. But, it means many of the things you're talking about, you can't directly "see". You won't know (and, honestly, shouldn't care) what commands were used to get a user's local repo into a particular state. You just know what's pushed - and any centrally-enforced rules need to be described in terms of what's pushed.

That means the useful tools are config options on origin, and the pre-receive hook (and possibly the update hook) on origin (unless you're in a hosted environment that provides a different security model).

One thing you can do is globally refuse to accept history rewrites. Then if someone rebases a branch after it's been pushed, origin simply won't take the push. (The user could still work on a branch, then rebase it before the first time they share it. There is nothing you can do about that, and really no good reason to care.) On the origin you'd set receive.denyNonFastForwards to true.

You can enforce pretty much whatever rule you want with a hook; if you can work out the necessary script. Maybe you want to enforce commit topology rules (e.g. "no non-merges in master" a la gitflow), or require signed commits (see below), or whatever.

If the rules are user-specific, or if you want to log potential violations instead of (or in addition to) blocking them, then authentication is a concern. Securing access to the repo - and authenticating who is accessing the repo - is not something git really addresses. There are several server environments for hosting git repos - like github, gitlab, TFS. Those types of server provide security options. You could also set your repo up so that the only way to reach it is through authenticated means (properly authenticated http, or ssl).

Accepting commits only if they're signed (or only if they're reachable from a signed tag) is also an option that tells you something about who did what, but maybe not what you want to know. (Just because I wrote and signed a commit, doesn't mean anything about who moved a ref to point to that commit and tried to push the result.)

If you can work out authentication, but can't script out detection of every rule - or maybe aren't sure what the rules need to be, but would know "bad behavior" when you see it - then simply logging the authenticated identity with the push's ref list would tell you probably everything you need to know to figure things out "after the fact".

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
  • Yes, but it's the pre-receive and update hooks, not the post-receive hook. :-) There is also the rather fancy Gitolite control system. – torek Jul 02 '18 at 16:18
0

Sounds like what you need is to set up an audit trail. It doesn't seem like there's built-in support for this in git, but you can keep a closer eye on the repo with the help of a couple of hooks.

alexis
  • 48,685
  • 16
  • 101
  • 161
  • `pre-rebase` and `post-commit` are client hooks, which will not be suitable for an audit trail (especially since OP is expressing distrust of the users to the extent of worrying about *malicious* misuse). – Mark Adelsberger Jul 02 '18 at 13:53
  • Hmm, so there's such a thing as a remote rebase? Scratch that then! Thanks @Mark. So then it should be one of the hooks from [git-receive-pack](https://git-scm.com/docs/git-receive-pack), e.g. `pre-receive`, `post-receive`, `update`, etc.? – alexis Jul 02 '18 at 14:36
  • I'm not sure what you mean about a "remote rebase"; rebase is done locally (and then the resulting state can be `push`ed, though often this will require `-f`), which is exactly why `pre-rebase` has to be a client hook. To the other part of your comment - yes, `pre-receive` is often what you'd use to enforce a central policy about "acceptable" commits and ref updates. – Mark Adelsberger Jul 02 '18 at 14:42
  • If rebasing is done on a user's pc and the rebased changesets pushed to the server, then its effects should be visible on the remote server that the OP is concerned about; so why would this be a concern? The problem would be with rebasing done directly on the server, I would think, and this would get logged (if the hooks are not disabled first). I guess I'm missing something, but I don't want to add to the noise if it's not relevant to answering the OP's question. – alexis Jul 02 '18 at 14:47
  • If users can directly rebase on the server, then any concern about security is out the window. I'm not following your logic about why a push of a post-rebase state wouldn't be an issue; if it rewrites history, then the original history is lost (if the push is accepted). – Mark Adelsberger Jul 02 '18 at 14:53
  • That's what I meant by "remote rebase": I was not aware that you can push the _deletion_ of changesets along with the new (rebased) changesets. (I'm more familiar with the details of rebasing semantics in pre-"evolution" Mercurial, where only changesets can be propagated; rebasing and pushing results in two copies on the remote.) – alexis Jul 03 '18 at 07:46
  • Even if users can directly rebase on the server, all is not lost: The log can be written to a different server, for example, or copied periodically. Even if a malefactor takes the trouble to disable logging before rebasing, the history *up to that point* will be recorded and provide enough clues. – alexis Jul 03 '18 at 07:47
  • "...that you can push the deletion of changesets..." Rebasing is *NOT* the deletion of changesets, locally or on the remote; this is a common misconception. As for the assertions about security; not going to argue about it, but I strongly disagree with your assessment. – Mark Adelsberger Jul 03 '18 at 12:38
  • I defer to your assessment on the security issue, then. About rebasing, my thin `git` background is showing. I've understood rebasing as replacing a changeset with a similar one with a different parent, and hence different hash. I'll go read up on it, sorry for spreading confusion! (On Mercurial, as I mentioned, this was once really an add-delete pair, and only the add part could be pushed.) – alexis Jul 03 '18 at 12:47
  • I don't know about mercurial's approach. In `git` rebase usually creates new commits and then moves a ref. That certainly can remove commits from *that ref's history*, and it may leave commits unreachable - which means they might eventually be garbage-collected. But it does not delete anything. And everything it does, can be pushed. – Mark Adelsberger Jul 03 '18 at 13:23