0

Git philosophy question: are "ancestry" and "history" (a.k.a. chronology) distinct things?...

Behold, I have the following git repository:

      A---B---C develop
     /
D---E---F---G master

And I want master's head to contain exactly what develop has at its head:

      A---B---C develop
     /         \ 
D---E---F---G---C master

Except, I don't want a merge. I want it to be something like this:

      A---B---C   develop
     /         \ 
D---E---F---G===C--- master

where the head of master would NOT have multiple ancestors, it would just become what the head of Develop is. The === in my diagram imply chronological connection, but not ancestry.

For clarification, I don't want to use git merge -X theirs or git merge -r ours because those are still true merges... I just don't want a merge: I want a "paste". However, I also need to be able to look at the git log on master and see commit C, then commit G, then commit F, etc., even though there are no paternal ties between C and G. Also, my constraints forbid me from erasing master or develop, or messing with their histories. Is there a simple way to do this?

JCollier
  • 1,102
  • 2
  • 19
  • 31
  • I think you need what's described here: https://stackoverflow.com/a/2763118/547270 – scrutari Mar 02 '18 at 22:13
  • 1
    Does renaming the old master and renaming develop to master not do what you want? – tripleee Mar 02 '18 at 22:16
  • @tripleee Correct, that is not what I want. I am dealing with a complex system, and that would probably mess with about a thousand scripts. – JCollier Mar 02 '18 at 22:18
  • Your diagram and your description are completely at odds. The diagram you've posted shows that master **no longer contains F or G**. You cannot have that diagram and still have F and G appear in `git log` for master. – user229044 Mar 02 '18 at 22:19
  • What I want is for G to be after C in the master `git log`, but to not have G be a parent of C. You're saying this is impossible? I also replaced with `===` in the third diagram to imply chronological connection without ancestry. – JCollier Mar 02 '18 at 22:20
  • No, just that your diagram is incorrect. If you want `G` to appear in the history of master, you should have a line connecting master to `G`. – user229044 Mar 02 '18 at 22:23
  • @scrutari that is a sweet solution, but unfortuantely, it's still a merge. – JCollier Mar 02 '18 at 22:36
  • 2
    @JCollier why insist on keeping the commits F and G as ancestors when you want the head of C to be the head of master? You can retain commits F (and G) with a tag if you really want them, but what you're trying to do (avoiding ancestry) is like saying you want to make a wheel that rolls and is not also round - Git simply doesn't work like that. – BenKoshy Mar 02 '18 at 23:35
  • You could reset master to be at the head of C. that seems the easiest. and then add a tag at commit G and you'll be fine. – BenKoshy Mar 02 '18 at 23:37
  • 2
    As others have said, this doesn't make sense in Git. I suspect you have an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem): you're asking about your solution, but you're not telling us the problem you're trying to solve. How about you back up and tell us about the problem you are having that you're trying to solve? – Schwern Mar 03 '18 at 03:35
  • Let's try this a different way: what is it about having G in the ancestry that you don't want? Please be concrete. – jthill Mar 03 '18 at 04:12
  • This is my problem: my boss told me to do it precisely the way I described. I imagine it wasn't his choice either, to be honest. Whoever it is up the ladder, they want the master branch to mostly contain all of the stable releases to be pasted in, inter-spaced with a few strange commits. If it was my choice I would yield to a better solution :) – JCollier Mar 04 '18 at 04:21

3 Answers3

3

You can achieve something like this, but it's not how Git works, and you shouldn't do it.

You can use git reset to move master around and drag in the current state of develop. You need to move the master branch from its current commit, G, to the desired commit C, and then soft reset master back to its original commit, G, dragging the current state of C with it. You can then commit those changes, creating a new commit (NOT C) with the changes as they exist on develop.

git checkout master
git reset --hard develop
git reset <commit id of C> # or git reset HEAD@{1}
# the working directory now contains the difference between C and G
git commit -m 'a message for C'

At this point, master and develop will contain identical content, but with different histories of commits. The commit graph will look like this:

      A---B---C   develop
     /          
D---E---F---G---C' master

Where C' and C contain identical content, but divergent histories.

You can instead merge first, and then perform the steps exactly as above, and create a new commit that contains both master and develop as ancestors but with the content of C:

git checkout master git merge develop git reset --hard develop git reset HEAD@{1} git commit -m 'a commit message for C'.

This will give you something like, the following, where M is a merge commit:

      A---B---C       develop
     /         \ 
D---E---F---G---M---C' master

But again, this isn't really how Git is meant to work, and there is very little reason to do this if you intend to discard all changes from one branch or the other.

user229044
  • 232,980
  • 40
  • 330
  • 338
2

Edit: let me start with the philosophy question (which I missed earlier).

Git philosophy question: are "ancestry" and "history" (a.k.a. chronology) distinct things?...

Not really. Well, maybe: "ancestry" and "history" are the same thing, or at least, deeply connected. Chronology ... is often irrelevant. Git stores commits, and the commits are, or at least form, the history.

Each commit has some set of parents: usually just one parent, but merge commits have two or more, and at least one commit in the repository—called the root commit—has no parents. In general, you make a new commit by selecting an existing commit, selecting or creating a tree (snapshot) to go into the new commit, and writing out a new commit with:

  • you as the author and committer, and "now" as the time stamps of the new commit;
  • your tree as the snapshot;
  • an arbitrary commit message; and
  • your current commit's ID—and perhaps more commit IDs—as the parent(s) of the new commit.

Once made, a commit is frozen in time forever: it literally cannot be changed, as its hash ID—its "true name" in terms of how Git finds it in the database—is a cryptographic checksum of its contents. Change the contents in any way, even just one bit, and you get a different checksum, which means a new and different commit (and meanwhile the old commit remains in the repository).

The special case of a root commit occurs when you make a commit without starting with a current commit. This obviously has to be the case for the first commit in a new, empty repository: there are no previous commits, so the first commit cannot have a parent. After that, all normal commits have some existing commit as their parent, though you can create new root commits.

The result is that commits string together, pointing "backwards in time", regardless of any time-stamp stored inside the commit. We can draw a small three-commit repository like this, with single uppercase letters standing in for the actual commit hashes:

A <-B <-C

Commit C is the latest (even if its timestamp says it was made in the 1970s); commit B is C's parent; and root commit A is B's parent. These facts are embedded in each commit, which is read-only and incorruptible.

The way Git finds C is through a branch name, like master. C has some hash ID—the cryptographic checksum of C's contents—and Git places that hash ID in a key-value store, with the key being the name master. So given master, Git finds C, which Git uses to find B, and so on. To place a new commit onto the master branch, we would now write some new stuff, run git add and git commit, and get a new commit D with a new snapshot, whose parent is commit C. Whatever the hash ID of D is, Git would write that into master, and now master will point to new commit D.

This is how the history and the ancestry are so deeply intertwingled. We can, at any time, given all the starting-point names (branches, tags, and any other names), find all the commits in the repository through these linkages. Drawing the linkages produces the commit graph.

The git log command does not necessarily, always, show you commits in the order they occur in this graph. For instance and in particular, if git log has, in its "commits to show" queue, two or more commits, it must pick one to show first. The one it picks is based on the sorting criteria you specify on the git log command line. If it has only one commit to show, though, it shows that one. Having shown that one commit, git log normally adds that commit's parents (all of them) to the queue, unless they have already been shown.

The default for git log is to show you the HEAD commit, by putting that (single) commit into the queue. There's now just one commit in the queue, so Git removes it and shows it. If that's an ordinary, single-parent commit, you will see it, and then git log will put its (single) parent into the now-empty queue. Git will then show you the parent: you will see commits in their graph order, regardless of any commit time-stamps.

Since merge commits have, by definition, at least two parents, the act of showing a merge commit normally drops two as-yet-unseen commits into the queue of commits to show. It's at this point that time-stamps may (and by default, do) enter the picture. If you don't specify any particular sorting criterion, the one Git uses by default is: show the commit with a higher commit time-stamp first. So now chronology—with the meaning time stamps stored in commits, which may have nothing to do with when the commits were actually made—have some effect.

(Note that you can adjust one or both time stamps on any new commit at the time you make it by changing your computer's clock, or more easily and reproducibly, by setting GIT_AUTHOR_DATE and/or GIT_COMMITTER_DATE environment variables when running git commit. Some commands, including git commit itself, also take --author-date flags and the like.)

On to the rest of the issue

As several people have told you, you can't get what you want, in Git.

Moreover, this drawing is nonsensical:

      A---B---C develop
     /         \
D---E---F---G---C master

as it contains two different commits that are both labeled C. You cannot change the existing commit, and any new commit you make will, by definition, have a new and different hash ID.

So is this drawing, for the same reason:

      A---B---C   develop
     /         \ 
D---E---F---G===C--- master

If, however, we replace the extra C with a completely different commit—one with a different hash ID—then we can get the former graph, which now looks like:

      A---B---C   <-- develop
     /         \
D---E---F---G---H    <-- master

That is, the name master selects commit H, whose history includes all the commits, while the name develop selects existing commit C, whose history goes back through B to A and then on back to E and so on. Or, if you prefer, we can get the graph:

      A---B---C   <-- develop
     /
D---E---F---G---H    <-- master

The snapshot associated with commit H can be whatever you want it to be. (To make such a snapshot entirely manually, you can work within your work-tree, run git add to copy updated work-tree files over their copies currently in the index, and then write git write-tree to write out index to form the tree for the snapshot for H—but you don't seem to need this.)

If you want commit H to have the same snapshot as commit C, so that git diff develop master produces no output at all, that's particularly easy: we can use Git's so-called plumbing commands to create commit H from the tree for commit C. There are three steps, in the end:

vim /tmp/msg
# write appropriate log message to file /tmp/msg

Next, create new commit H and save its hash ID somewhere. (I use a variable here. You can then inspect it with git log --graph $h for instance, if you want, before you actually commit to using it anywhere.)

h=$(git commit-tree -p master -p develop -F /tmp/msg develop^{tree})

This creates the commit with two parents, master and develop (i.e., G and C) in that order. If you want just one parent, leave out one -p and argument.

Finally, if all looks good and you're on master and wish to make H become the tip commit, reset or fast-forward-merge to H:

git checkout master      # if necessary
git merge --ff-only $h

This updates the current branch name—i.e., master—to point to commit $h (i.e., new commit H), and updates the index and work-tree correspondingly, so that the contents of commit H now occupy the usual staging and working spaces.

(Incidentally, if you do make final commit H with two parents, and make the first parent be commit G and the second commit C but use commit C's content, that would be the equivalent of git merge -s theirs, if it existed. It doesn't, but you can synthesize it any number of ways, including the one above. See also VonC's answer here and some of the related links.)

torek
  • 448,244
  • 59
  • 642
  • 775
0

I don't want a merge: I want a "paste". However, I also need to be able to look at the git log on master and see commit C, then commit G, then commit F, etc., even though there are no paternal ties between C and G.

This is self-contradictory. You've severed the ancestry links. You've entirely abandoned the history.

Git checkout -B master develop

is how you abandon an existing master history, making master now refer to the current develop history.

If you want to see the old abandoned history in master's new history you can then

git merge -s ours master@{1}

to record the ancestry without taking any of the content, but you've either recorded the ancestry or you haven't. You can't sever your ancestry links and have them too.

jthill
  • 55,082
  • 5
  • 77
  • 137
  • If the idea that "ancestry and history are separate entities" is a delusion in my mind and they are actually the same thing, then it would help me to clear that up. Anyway, since I currently believe that ancestry and history are two separate things, I don't think my question is contradictory. I realize that this is essentially against the ideology of git, but I am working in a large corporate environment where I am compelled to use exotic solutions so that the system doesn't break. – JCollier Mar 02 '18 at 22:50
  • 2
    Why are you using words like "delusion" and "ideology" when using a tool? It works how it works. Ancestry links are how history is structured. – jthill Mar 02 '18 at 23:41
  • 3
    @JCollier It's not "ideology". The diagrams you drew are literally how Git works under the hood. Those nodes and connections are real. The technical term is a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). And [the commit IDs are checksums](https://stackoverflow.com/questions/49075782/checking-integrity-of-a-git-repo/49075898#49075898) preventing exactly what you want to do. You can write new history, but you can't fake it. – Schwern Mar 03 '18 at 03:39