GIT : How to maintain same history of commits in two different branches

Question

Using gitbash to merge and commit.

Let me explain the basic structure first. So we have origin/dev which we pull and start working on. After the changes are done we push the changes to origin/dev.

Then using gitbash to merge dev to qa,I do below

git checkout qa

# for all recent changes in origin/qa(similar have parallel origing/dev and uat as well.)
git pull

# for checking out changes in dev to my local qa space which will be merged
# to origin/qa by the below commands
git checkout dev -- directorynameToCheckoutCodeFrom

git commit
git push

So this is the process normally followed between any 2 different environment when merging happens.

So my issues is I make 5 commits for 5 issues in DEV all have different commit id. So when i merge from DEV to QA when I commit all five changes in 1, I get 1 commit id and all the changes will be merged in 1. Same happens when merging in UAT.

Is there any way we can maintain same history between different environments. The real issues comes in QA we might merge 4-5 times in 10 days and in UAT we would like to keep intact and merge only once a month. In that case if we commit all the changes from QA to UAT as one commit the history which is different in QA will be lost. Any way to tackle this?

Gone through some posts online but was unable to understand, what i understood was the only way is making frequent commit as we doing in DEV env. For 1 issue merge in dev>then qa>the uat this is the only way to preserve the same history is my understanding correct.

What do you want to represent with `git checkout --dev`? Is it just a whitespace error and meant to be grouped with the dir you check out from ? — Romain Valeri, Mar 27 '19 at 10:25
That is like the code which is in dev will be checked out and then when i do git status in my local merging space i will see all the files which i want to merge from dev to qa. So after that i will commit and push — NeverGiveUp161, Mar 27 '19 at 10:26
Yeah, I get the principle of checking out a directory. But the `--dev` option does not exist, and it *looks* like you went for `git checkout dev -- DirectorynameToCheckoutCodeFrom` and botched the whitespace. Is it the case? — Romain Valeri, Mar 27 '19 at 10:28
I understand that you'd like to know, but there will always gonna be people downvoting without a comment in some cases. I'm unsure why they did, seems the question is legit enough. In any case, you'll have to get over it I guess. — Romain Valeri, Apr 09 '19 at 10:18

score 7 · Accepted Answer · answered Mar 27 '19 at 16:42

There is not a history of commits. There are only commits; the commits are the history.

Each commit is uniquely identified by a hash ID. That hash ID is the true name of the commit, as it were. If you have that commit, you have that hash ID. If you have that hash ID, you have that commit. Read out the big ugly hash ID and see if it's in your database of "all the commits that I have in this repository": i.e., see if Git knows it. If so, you have that commit. For instance, b5101f929789889c2e536d915698f58d5c5c6b7a is a valid hash ID: it's a commit in the Git repository for Git. If you have that hash ID in your Git repository, you have that commit.

People don't normally type in, or use, these hash IDs at all. Git uses them, but Git is a computer program, not a human. Humans don't do well with these things—I have to cut and paste the above hash ID or I'll get it wrong—so humans use a different way to get started. Humans use branch names. But many different Git repositories all have master and this master doesn't always (or ever!) mean that big ugly hash ID I typed in above. So a name like master is specific to one particular Git repository, while hash IDs are not.

Now, every commit stores some stuff. What a commit stores includes a snapshot of all the files that go with that commit, so that you can get it back out later. It also includes the name and email address of the person who made that commit, so that you can tell who to praise or blame. It includes a log message: why the person who made the commit says they made that commit. But—here's the first tricky part—almost every commit also includes at least one hash ID, which is the commit that comes before this particular commit.

So, if you have b5101f929789889c2e536d915698f58d5c5c6b7a, then what you have is this:

$ git cat-file -p b5101f929789889c2e536d915698f58d5c5c6b7a | sed 's/@/ /'
tree 3f109f9d1abd310a06dc7409176a4380f16aa5f2
parent a562a119833b7202d5c9b9069d1abb40c1f9b59a
author Junio C Hamano <gitster pobox.com> 1548795295 -0800
committer Junio C Hamano <gitster pobox.com> 1548795295 -0800

Fourth batch after 2.20

Signed-off-by: Junio C Hamano <gitster pobox.com>

(The tree line represents the saved snapshot that goes with this commit. You can ignore this here.) The parent line gives the hash ID of the commit that comes before b5101f929789889c2e536d915698f58d5c5c6b7a.

If you have b5101f929789889c2e536d915698f58d5c5c6b7a you almost certainly also have a562a119833b7202d5c9b9069d1abb40c1f9b59a. The history for the later commit is the earlier commit.

If we replace each of these big ugly hash IDs with a single uppercase letter,¹ we can draw this sort of history a lot more easily:

... <-F <-G <-H

where H is the last commit in a long chain of commits. Since H holds G's hash ID, we don't need to write down G's big ugly hash ID, we can just write down H's hash. We use that to have Git find G's ID, inside H itself. If we want F, we use H to find G to find F's ID, which lets Git retrieve F.

But we still have to write down that last hash ID. This is where branch names come in. Branch names like master act as our way of saving the hash ID of the last commit.

To make a new commit, we have Git save the hash ID of H in our new commit. We have Git save a snapshot and our name and email address and all the rest of that as well—"the rest" includes a time stamp, the precise second when we had Git do all this. Now Git computes the actual hash ID of all of this data, including the time stamp. The commit is now saved in our database of all commits, and Git has given us a new hash ID I:

...--F--G--H   <-- master
            \
             I

We have Git automatically write I's hash ID into our name master:

...--F--G--H--I   <-- master

and we've added new history, which retains all the existing history.

¹Of course, if we only used one uppercase letter like this, we'd run out of the ability to create commits, anywhere in the world, after creating just 26 commits. That's why Git's hash IDs are so big. They hold 160 bits so the number of possible commits or other objects is 2¹⁶⁰ or 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976. As it turns out, this isn't really enough, and Git will probably move to a larger hash that can hold 79,228,162,514,264,337,593,543,950,336 times as many objects. While the first number is big enough to enumerate all the atoms in the universe, there are specific attacks that are troublesome, so a 256-bit hash is a good idea. See How does the newly found SHA-1 collision affect Git?

This tells you how to have the same history

History is the commits. To have the same history in two branches, you need both branch names to point to the same commit:

...--F--G--H--I   <-- master, dev

Now the history in master is: Starting at I, show I, then move back to H and show H, then move back to G... Likewise, the history in dev is: Starting at I, show I, then move back to H...

Of course, that's not quite what you want. What you want is to have history that diverges, then converges again. That's what branches are really about:

...--F--G--H   <-- master
            \
             I   <-- dev

Here the history in dev starts (ends?) at I, then goes back to H, and then G, and so on. The history in master starts (ends?) at H, goes back to G, and so on. As we add more commits, we add more history, and if we do it like this:

             K--L   <-- master
            /
...--F--G--H
            \
             I--J   <-- dev

then the history of the two branches diverges. Now master starts at L and works backwards, while dev starts at J and works backwards. There are two commits on dev that are not on master, and two commits that are on master that are not on dev, and then everything from H on back is on both branches.

This divergence—the commits that are not on some branch—is where the lines of work diverge. The branch names still only remember one commit each, specifically the tip or last commit of each line of development. Git will start at this commit, by the saved hash ID, and use that commit's saved parent hash ID to walk backwards, one commit at a time. Where the lines rejoin, the history rejoins. That's all there is in a repository, except for the next section.

Merges combine history (and snapshots)

What you can do now is make a merge commit. The main way to make a merge commit is using the git merge command. This has two parts:

combining work, where Git figures out what has changed in each line of development; and
making a merge commit, which is a commit with exactly one special feature.

To make a merge, you start by picking one branch tip. You run git checkout master or git checkout dev here. Whichever one you pick, that's the commit you have out now, and Git attaches the special name HEAD to that branch name to remember which one you picked:

             K--L   <-- master (HEAD)
            /
...--F--G--H
            \
             I--J   <-- dev

Now you run git merge and give it an identifier to choose the commit to merge. If you're on master = L, you'll want to use dev = J as the commit to merge:

git merge dev         # or git merge --no-ff dev

Git will now walk the graph as usual to find the best shared commit—the best commit that's on both branches, to use as a starting point for this merge. Here, that's commit H, where the two branches first diverge.

Now Git will compare the snapshot saved with commit H—the merge base—to the one in your current commit L. Whatever is different, you must have changed on master. Git puts those changes into one list:

git diff --find-renames <hash-of-H> <hash-of-L>   # what we changed

Git repeats this but with their commit J:

git diff --find-renames <hash-of-H> <hash-of-J>   # what they changed

Now Git combines the two sets of changes. Whatever we changed, we want to keep changed. Whatever they changed, we want to use those changes too. If they changed README.md and we did not, we'll take their change. If we changed a file and they didn't, we'll take our change. If we both changed the same file, Git will try to combine those changes. If Git succeeds, we have a combined change for that file.

In any case, Git now takes all of the combined changes and applies them to the snapshot in H. If there were no conflicts, Git automatically makes a new commit from the result. If there were conflicts, Git still applies the combined changes to H, but leaves us with the messy result, and we have to fix it up and do the final commit ourselves; but let's assume there were no conflicts.

Git now makes a new commit with one special feature. Instead of just remembering our previous commit L, Git has this merge commit remember two previous commits, L and J:

             K--L   <-- master (HEAD)
            /    \
...--F--G--H      M
            \    /
             I--J   <-- dev

Then, as always, Git updates our current branch to remember the new commit's hash ID:

             K--L
            /    \
...--F--G--H      M   <-- master (HEAD)
            \    /
             I--J   <-- dev

Note that if we do the merge by running git checkout dev; git merge master, Git would do the same two diffs and get the same merge commit M (well, as long as we did it at the exact same second so that the time stamps match up). But then Git would write the hash ID of M into dev rather than into master.

In any case, if we now ask about the history of master, Git will start at M. Then it will walk back to both L and J and show both of them. (It has to pick one to show first, and git log has a lot of flags to help you choose which one to show first.) Then it will walk back from whichever one it picked first, so that it now has to show both K and J, or both L and I. Then it will walk back from whichever one it picked to show.

In most cases Git shows all the children before any of the parents, i.e., eventually, it will have shown all four of I, J, K, and L and have only H to show. So from here, Git will show H, then G, and so on—there's now just one chain to walk back, one commit at a time. But be aware that when you traverse back from a merge, you run into the which commit to show next problem.

`git merge` does not always make a merge commit

Suppose you have this history:

...--F--G--H   <-- master
            \
             I--J   <-- dev

That is, there's no divergence, dev is merely strictly ahead of master. You do git checkout master to select commit H:

...--F--G--H   <-- master (HEAD)
            \
             I--J   <-- dev

and then git merge dev to combine the work you've done since the merge base with the work they did since the merge base.

The merge base is the best shared commit. That is, we start at H and keep going back as needed, and also start at dev and keep going back as needed, until we reach a common starting point. So from J we go back to I and to H, and from H we just sit quietly at H until J goes back here.

The merge base, in other words, is the current commit. If Git ran:

git diff --find-renames <hash-of-H> <hash-of-H>

there would be no changes. The act of combining no changes (from H to H via master) with some changes (from H to J via dev), then applying those changes to H, is just going to be whatever is in J. Git says: well, that was too easy and instead of making a new commit, it just moves the name master forwards, in the opposite of the usual backwards direction. (In fact, Git really did work backwards—from J to I to H—in order to figure this out. It just remembers that it started from J.) So what you get here, by default, is this:

...--F--G--H
            \
             I--J   <-- dev, master (HEAD)

When Git is able to slide a label like master forward like this, it calls that operation a fast-forward. When you do this with git merge itself, Git calls it a fast-forward merge, but it's not really a merge at all. What Git really did was to check out commit J, and make master point to J.

In many cases, this is is OK! The history is now: For master, start at J and walk back. For dev, start at J and walk back. If that's all you need and care about, that's fine. But if you want a real merge commit—so that you can tell master and dev apart later, for instance—you can tell Git: Even if you can do a fast-forward instead of a merge, do a real merge anyway. Git will go ahead and compare H to H, and then compare H to J, and combine the changes and make a new commit:

...--F--G--H------K   <-- master (HEAD)
            \    /
             I--J   <-- dev

Now you get a real merge commit K, with two parents as required to be a merge commit. The first parent is H as usual, and the second is J, as is usual for a merge commit. The history of master now includes the history of dev, but remains different from the history of dev, because the history of dev doesn't include commit K.

Note that if you now switch back to dev and make more commits, the result looks like this:

...--F--G--H------K   <-- master
            \    /
             I--J--L--M--N   <-- dev (HEAD)

You can now git checkout master and git merge dev again. This time you won't need --no-ff because there is a commit that's on master that's not on dev, namely K, and of course there are commits on dev that are not onmaster, namelyL-M-N. The *merge base* this time is shared commitJ(notH—His also shared, butJ` is better). So Git will combine changes by doing:

git diff --find-renames <hash-of-J> <hash-of-K>   # what did we change?
git diff --find-renames <hash-of-J> <hash-of-N>   # what did they change?

What did we change from J to K? (That's an exercise for you, the reader.)

Assuming Git is able to combine the changes on its own, this merge operation will succeed, producing:

...--F--G--H------K--------O   <-- master (HEAD)
            \    /        /
             I--J--L--M--N   <-- dev

where new merge commit O combines the J-vs-K changes with the J-vs-N changes. The history of master will start at O and will include N and M and L and K and J and I and H and so on. The history of dev will start at N and include M and L and J (not K!) and I and H and so on. Git always works backwards, from child to parent. Merges let / make Git work backwards along both lines at the same time (but shown to you one at a time, in some order depending on arguments you supply to git log).

Guys! This is long and would take time to read, but its worth spending every second on this read. — NeverGiveUp161, Mar 28 '19 at 10:33
:so I got to understand now that we are not doing any merge. We are just checking out the branch where we want to merge like when in qa checkout dev and then commit and push. So that gets pushed to orgin/qa and origin/uat. So all these are separate branches having separate commits and what i wanted was to maintain same commits history in origin/qa and origin/dev and origin/uat and i guess i can do with git merge dev --no-ff — NeverGiveUp161, Mar 28 '19 at 10:38
I want to merge like, inside a branch we have several folders with different module so I want to name the module to merge but git is giving me error like `not something we can merge` using the command like this `git merge dev --no-ff qa/folder/onemorefolder` — NeverGiveUp161, Apr 09 '19 at 14:59
@NeverGiveUp161: Git doesn't store *files*, Git stores *commits*. Commits then store files, to be sure; but the point here is, you don't *have* the option to divide these up by folder. You either use the *commit*, meaning the *whole* commit, or you don't. You merge by *commit*. If you want a partial merge, do a full merge, which gets you the part you want along with parts you don't want. Commit the result, then take the new commit (stuff you want plus stuff you don't) and modify it by replacing the stuff you didn't want. — torek, Apr 09 '19 at 16:43
@NeverGiveUp161 There are version control systems that work file-by-file rather than commit-by-commit. When using such VCSes, you can get what you want here. The problem with these VCSes is that you *have to* work file by file. That's their advantage and their disadvantage. Those VCSes are mostly aging and being moved-away-from: the model just doesn't do what most people want these days. The commit-by-commit model has proven to be superior. — torek, Apr 09 '19 at 16:47
Thanks, The reason for this design of different folder was because different people working on different module and different things goes to qa/uat in different timelines. So when we do the 'git checkout dev -- directorynameToCheckoutCodeFrom' in QA, we get the files changed only in those folders. So its easy to manage, and when we commit that in QA its just a fresh commit with all these files with 1 commit id. Whereas these might be checked into dev 3 times with different commit id's.So we merging manually dev to QA file changes not merging a commit id...new learning for me. — NeverGiveUp161, Apr 10 '19 at 10:02

score 1 · Answer 2 · answered Mar 27 '19 at 10:36

1

you can try with

git checkout qa

git merge dev --no-ff

git push

git merge dev --no-ff

is mostly use to pull all the dev branch commit to qa with their history.

answered Mar 27 '19 at 10:36

Jay Bhalodi

387
3
19

won't it be a problem? see in the 3rd command when i checkout we can define the directory/folder to checkout. So there can be different modules right? I can ignore some modules and merge some using the 3rd command in my question. – NeverGiveUp161 Mar 27 '19 at 13:10
In your case with the merge command can we select which directory to merge or use *nameStartsWith kind of regex while merging? – NeverGiveUp161 Mar 27 '19 at 13:11
I would try this and update but the above answers are more elaborate so will see which should be the accepted answers after going through. Thanks – NeverGiveUp161 Mar 28 '19 at 09:11
This would work, but for completeness accepting above answer. Thanks Jay – NeverGiveUp161 Mar 28 '19 at 10:38

score 1 · Answer 3 · answered Mar 27 '19 at 17:06

In the process you describe, you want to 'merge' changes from an individual directory within a repo. This is contrary to how git works, and that is why you're having trouble keeping a good history.

It's important to understand that what you're doing is not really a merge[1]. A merge commit has two (or more) parent commits, and in that way the full history is preserved. To be fair, git has a tendency to be "flexible" to the point of inconsistency in how it uses certain terms; there are operations it calls "merging' that don't result in merge commits. But even with those operations, you merge the entire content - not an individual directory.

If you have distinct modules - or, however you might describe them, different content in different directories - that change independently (which certainly applies if you promote them between branches/environments separately), they should be in separate repos. I suppose if it helps you could gather them up as submodules of a 'parent' repo, to have the ability to clone from a single url or whatever. But beyond that, if this type of separation isn't acceptable for some reason, you may need to consider whether git is the best tool to meet your particular source control requirements.

[1] I could also argue semantics about merging due to the fact that if both dev and qa had changes, the changes from qa would be overwritten and lost - which is not typically what is desired in a merge. But you would then probably argue that changes always flow from dev to qa, so it's not applicable; and anyway, git does sometimes describe the clobbering of one branch from another as a merge (i.e. the "ours merge strategy').

GIT : How to maintain same history of commits in two different branches

3 Answers3

This tells you how to have the same history

Merges combine history (and snapshots)

git merge does not always make a merge commit

`git merge` does not always make a merge commit