2

A Git commit object just stores the author, date, message, parent commit hashs and the root dir hash.

So, if I want to find out which childs of this commit exist (of course considering only a single repository), how does Git do that? Basically this. But how does Git do that internally? How efficient is that implemented? What is its time complexity? Does it go from every branch head and traverse all possible paths from there and that way, collects all possible childs? Or does it do some clever caching or other method?

I can imagine that the traversing can become really slow if you want to know the childs of a very old commit. It's basically linear to the size of the whole repository. A user which doesn't know about how commits are stored in Git would probably even assume constant time for such request, so a big difference.


I really want to know this, and not something different. It doesn't matter why I want to know this. I just want to know it. That's why I'm not always explaining how I did came up with this question. It really would not add anything to the question.

Community
  • 1
  • 1
Albert
  • 65,406
  • 61
  • 242
  • 386
  • 2
    If by "child" you mean the commit that comes *after* a commit, git doesn't store that anywhere. Once a commit is made, it's stored under the hash of its contents, so you can't modify it, and obviously no children commits exist for a commit at the time you're creating it. – Gary Fixler Mar 28 '14 at 08:38
  • 2
    Yup, not efficient. Fortunately, we almost never need to know children - it's simply not in Git workflow. – Amadan Mar 28 '14 at 08:39
  • -1 for not explaining why on Earth would you need to do that. – Jan Hudec Mar 28 '14 at 08:47
  • Git developers don't do anything for which they don't have a reasonable use case. Asking for something without having realistic use case is considered almost a sin. – Jan Hudec Mar 28 '14 at 08:53
  • @JanHudec: Why does that make the question bad? And you can easily check via Google that it is not that uncommon that someone wants to know the childs of a commit. – Albert Mar 28 '14 at 09:21
  • @Albert: Questions without motivation are always bad, because appropriate answer might, and often does, depend on the context. – Jan Hudec Mar 28 '14 at 10:28
  • @JanHudec: I'm sure that they don't do in this case, that's why I have not added that info (by intention), otherwise I would have. I know what context makes sense, and if I don't, I say that and add possible relevant context. Also, if I leave the context out, I especially want to have the question answered without context. I'm not sure why that is not obvious or why that makes a question bad. – Albert Mar 28 '14 at 11:23
  • @Amadan: Thanks, that is basically the answer to my question. – Albert Mar 28 '14 at 11:28
  • 2
    @GaryFixler: Of course a commit cannot be modified. But Git could save that information elsewhere, in some cache or so. Apparently, it does not. That is what I wanted to know here. – Albert Mar 28 '14 at 11:30

3 Answers3

2

Git has a command that can display all the branches of a repository, and its remote repositories, that are child of a commit:

git branch --all --contains <commithash>

That gives only the children that are heads of a branch, though. I must admit that I do not really understand the extent of your question.

Edit: From what I see in the code of Git (file commit.c in the source tree of Git), for each reference, the function is_descendant_of is called, and that function browse the graph recursively to find out if the reference contains the commit in its ancestors.

lrineau
  • 6,036
  • 3
  • 34
  • 47
  • The question was not how I would do that (the Git command), but how Git does it (internally) and how efficiently it is done (time complexity). – Albert Mar 28 '14 at 09:47
1

In a typical git repository, for any sufficiently old commit:

  • it's children are spread across the world and never met in a single repository and
  • some of them don't exist any more (even in single repository they die in time).

There is simply no way to find them and no reason to do so. Most of them are not interesting.

The interesting children are the ones that were merged into some interesting branch and can be found by starting from there.

Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
  • Well, of course, if you take different forked repos into account. But obviously it only makes sense to consider one single repo for the question of childs. – Albert Mar 28 '14 at 09:38
0

The reason git traversal path is from children to parents is because a commit object cannot depend on its children.

If a reference was stored to each child in the commit object, the SHA of the commit and its children would change each time a child was added, removed, renamed ( which would happen a lot here ). This would break down the whole simple store of objects, because commits would not have stable SHAs.

So with this constraint that only the parents can be stored, and there is no alternative cache because of this constraint. The queries you ask for ( I presume something like next 5 commits after SHA ) simply are not realistic, but can be formulated in another way by specifying the last child's SHA and the traversal path to the parent.

Emil Davtyan
  • 13,808
  • 5
  • 44
  • 66
  • Of course it sufficient. That was not the question. The question was whether it e.g. also stores the childs. If it does not, and also nothing else in a separate cache, as it seems from your answers, then calculating them is not efficient. That is what I wanted to know. – Albert Mar 28 '14 at 09:36
  • @Albert There is no separate cache for children, I added explanation for this. – Emil Davtyan Mar 28 '14 at 10:20
  • Sure, it's obvious that Git cannot store the childs in the commit itself. But Git could have a separate cache/store for that. Just like it's storing other mutable objects (e.g. the branch head refs). – Albert Mar 28 '14 at 11:26
  • @Albert Sorry, I'm never sure what is obvious. Yes you are right a cache can be built, though I'm not sure if it would be comparable to the storage of branches and tags especially in the use cases where it would be useful. – Emil Davtyan Mar 28 '14 at 14:24
  • Yes, probably some different structure for such cache would make more sense. Anyway, that was what I wanted to know -- whether Git has such cache or does something other clever or whether it does the naive canonical way, i.e. just traversing via the parent refs. – Albert Mar 28 '14 at 14:31