0

I am new to git and would like to understand what would happen in the following scenario. I have the following branches:

  1. Master Branch (currently deployed to Prod)
  2. Enhancement #1 - CleanupUntrackedFiles Branch: This was spawned off of Master. I made updates to the gitignore file to not include particular extensions in my repo and I removed those unnecessary files from the repo.
  3. Enhancement #2: This was ALSO created off of Master. Code updates are in support of a customer request.

At this point I would like to deliver BOTH Enhancement #1 and #2 into production. Is the best approach to merge these 2 branches together before merging with Master? If that is the case, how would Git know that I would like the files that I removed from Enhancement #1 to stay removed? Would the merge request of Enhancement 2 into Enhancement 1 bring in those old files again since Enhancement #2 was created off of the Master branch?

Thanks in advance for your help.

2 Answers2

0

I suspect that there are multiple solutions to your issue. Here's my recommendation:

  1. Create a copy of Master

git checkout master && git checkout -b copyOfMaster

  1. Merge your branches into the copyOfMaster (I'd do E#1 followed by E#2 based on what you're saying). Watch the output of the merge to know which files are kept.

Git merge strategies (you may need a recursive + ours combo): https://git-scm.com/docs/git-merge#_merge_strategies

The beauty of this step is that if something goes wrong, since you're working on a copy of Master you can just delete that branch and try something else.

git merge e1 -s recursive -X theirs && git status

git merge e2 && git status

  1. Test your master branch, if everything looks good and all the files are there or deleted which need to be, then merge copyOfMaster into master.

git checkout master && git merge copyOfMaster -s recursive -X theirs

sorenKaram
  • 426
  • 3
  • 5
0

There are two reasons that thinking about what Git does is difficult:

  1. Some of it just inherently is difficult, such as the whole concept of a distributed version control system (multiple separate copies of some repository). But this isn't what you're running into here; for that, we go to:

  2. When people start using Git, they think it's about branches, or files, or something like this. It isn't. Git is all about commits. That includes git merge: it doesn't really merge branches—well, sometimes it does, depending on what you personally mean when you say the word "branch"—but it always works with commits.

(An adjunct of the second point is that the word branch in Git is ambiguous. See What exactly do we mean by "branch"?)

To think properly about what Git will do here, we need to start with a proper definition of commits. Since you have been using Git for just a little bit, you're probably familiar by now with the sequence:

git checkout <something>
<edit some files>
git add <files>
git commit

and:

git log

But what, precisely, does all of this do? The git log command shows commits—with big ugly hash IDs—and git commit command makes a new commit, but what, precisely, is a commit? What are the parts of a commit? How do we name a specific commit? What does one do for us?

Commits

There is this lump of facts we just need to memorize:

  • Each commit is numbered. The numbers aren't simple counting numbers—they don't go #1, #2, #3, and so on—but they are numbers of a sort, and git log shows them. They're actually cryptographic checksums of the contents of each commit, and since each commit is guaranteed to be unique,1 each checksum is also unique.2

  • This means no part of any commit can ever be changed. If you take some existing commit out and modify it in some way, then stuff the new commit back into Git, you get a new (and unique) number for that new commit. The existing commit is still there. You haven't changed it, you have just added a new commit.

  • Each commit is made up of two parts: a snapshot of all the files that Git knows about, at the time you (or whoever) make the commit, and some metadata such as your name and email address and some date-and-time-stamps.

  • Because each commit is a full snapshot, Git saves space by keeping the files in the commit in a special, read-only, compressed, Git-ified format in which duplicate files are shared. That is, if some new commit you make has a thousand files, but 999 of them are the same as the files in some previous commit(s), that new commit really only has one new file. It re-uses the other 999. This is quite safe since no part of any existing commit can ever be changed.

  • Git includes, in the metadata part, the commit number of the previous commit.


1To this end, Git includes the date-and-time-stamps, so that even if you commit the same stuff twice, the two commits are different. Two commits can only be identical if you commit the same stuff at the same time, in which case, did you really commit it twice, or was that just deja vu?

2The pigeonhole principle tells us that this must eventually fail. Git's trick here is to hope that it doesn't fail until the universe ends first. SHA-1 doesn't quite cut it any more, and Git is moving to SHA-256 instead.


Commits are chained, and branch names find the ends

Now, whenever we have the commit number—the hash ID—of a commit or other internal Git object,3, Git can look up that object in its big database. So we say this hash ID, or anything holding this hash ID, points to the commit (or other object). Since each commit holds the hash ID of its earlier predecessor commit, that means each commit points back to an earlier commit. Git calls this earlier commit the parent of the later (child) commit. We can draw this:

... <-F <-G <-H

Here H stands in for the hash ID of the last commit. Once we actually have commit H, we can have Git look up the hash ID of earlier commit G, which lets Git find G. This provides the hash ID of earlier commit F, and so on.

What we need, then, is the hash ID of the last commit in the chain. We just need to save that somewhere. We could jot it down on a note, or on the office whiteboard. We could memorize it. Or—hey, we have a computer! Computers are good at remembering stuff. Let's have the computer remember the hash ID of the last commit in the chain.

This is what a branch name does for us: it remembers the hash ID of the last commit. From there, Git works backwards.

Let's draw it like this:

...--F--G--H   <-- master

Because commits literally can't change, we don't really need arrows pointing backwards from H to G, then G to F, and so on. H is always, forever, going to point backwards to G. We just need to remember that it's easy for Git to go from H to G, but not the other way: children point to their parents, but parents don't know their children, because the parents were frozen in time before they had any children. But the arrows coming out of the names, on the other hand—well, let's add another name such as develop or feature or topic:

...--F--G--H   <-- master, topic

Note that both names point to existing commit H. That's normal in Git: all the commits are now on both branches. But we do, now, need a way to tell which name we're using, and to that end, we'll have Git attach the special name HEAD to one of these other names:

...--F--G--H   <-- master (HEAD), topic

This means we're on branch master, as git status would say. If we now use git checkout topic or git switch topic,4 we get:

...--F--G--H   <-- master, topic (HEAD)

Now we're on branch topic. We haven't changed commits—we're still using commit H—but we did change names.

You can use git checkout or git switch to select a commit by its hash ID. When you do that, Git uses what it calls detached HEAD mode. Here the name HEAD isn't attached to some branch name. We won't go into any detail here, but it's useful to know about later, especially with git rebase, which uses this mode internally.


3There are actually four types of object, which Git calls blobs, commits, trees, and tags. Each gets a hash ID that is unique to that particular object's type-and-content. File content is stored as a blob object, and the fact that a unique blob content gets a unique hash ID is how Git de-duplicates files. It just falls right out of the storage system: if the file content is identical, the blob hash ID is also identical, and therefore the blob is already in the object database. You don't need to know most of this: you just need to know about commit hash IDs.

4The git switch command was new in Git 2.23. It's an attempt to make git checkout safer and simpler: the git checkout command does too many different jobs, so the jobs were split up (sort of roughly halved) and the new commands git restore and git switch each do half. The existing git checkout continues to exist, and will for probably years or decades. The new commands are already growing new abilities which are not being backported into git checkout, so it's probably best to switch over, but you have years to do this.


Viewing a commit

When we ask Git to show us one of these commits, we don't see a whole snapshot. Instead, we see changes. The way Git does this is remarkably simple. If we're looking at commit H, whose parent is G, the git show or git log -p command simply extracts both snapshots and compares them. When two files are the same, it says nothing. When two files are different, it figures out what's different, and shows that.

In short, even though commits hold snapshots, we usually see diffs. Git does this by comparing two snapshots. When Git compares a parent to a child, that shows us what happened in that commit; but we can have Git compare any two commits we like.

Making new commits: Git's index and your work-tree

We already noted that the files inside each commit are in a Git-ized and de-duplicated frozen format. I like to call this Git's freeze-dried format for short. Files in this format aren't actually useful for getting any new work done, so when you use git checkout or git switch to select some commit, e.g., by selecting a branch name to get on that branch, Git "rehydrates" the files to make useful copies.

The useful files are the ones you can see and work with. They exist in an area that Git calls your working tree or work-tree. Files here have their ordinary everyday form, and all of your computer's regular file-manipulation commands work on them. In fact, these files aren't under Git's control at all, and in an important sense, aren't in Git at all. They are your files, to do with as you will. You just tell Git, whenever you want and using git checkout, to replace these files with ones from some existing commit.

All version control systems (VCSes) have to have something like this, because no matter how they store files,5 they have the frozen-in-time committed versions, and the usable versions. But Git goes one step further than other VCSes: Git keeps a third copy of each file in something that Git calls, variously, the index, or the staging area, or—rarely these days—the cache. These are three names for the same thing.

Git's index is perhaps best thought of as the proposed next commit. What it actually stores is each file's name, as a long string with embedded slashes—path/to/file.ext is all just one long file name, for instance; there are no folders or directories in the index—and an internal hash ID for a freeze-dried version of that file. So the "copy" that's in the index is already de-duplicated, and is ready to go into a new commit.

When you use git checkout or git switch to extract some particular commit, Git fills both its index and your work-tree from the commit. The index holds the freeze-dried files, and your work-tree holds ordinary files, which your OS insists on naming with directory or folder names and file names. That file named path/to/file.ext became a directory/folder named path containing a directory/folder named to containing a file named file.ext. Git deals with your OS's peculiarities here, including any conversion from / to \ that might be required, at the time it does the index-to-work-tree conversion. This is also when it rehydrates the file, and when it does any CR-LF conversion if needed.

What this means is that initially, Git's index matches the current commit. These also match the files in your work-tree, which are now yours. As you modify your files in your work-tree, they stop matching the freeze-dried index copy. This is why Git makes you run git add all the time. The git add command tells Git: make your index copy match my work-tree copy. Git will compress and de-duplicate the file at this time, and update its index copy.


5Other systems may store files as deltas, and/or give them numbers—Unix-like inodes or equivalent—to help the VCS identify "same" files. Git does not do any of this, although it does have a level below the Git object level in which Git objects can be "packed" and delta-compressed.


Making new commits: git commit

Now we're ready to see how git commit really works. It:

  • gathers the appropriate metadata for the new commit: your name and email address, a log message to put into the commit, the current date-and-time for the time stamps, and the hash ID of the current commit;
  • writes out whatever is in Git's index, to become the new snapshot;
  • adds the metadata, and writes that out as the new commit, which provides the hash ID for the commit; and
  • (here's the tricky part) writes the new commit's hash ID into the current branch name, i.e., the name to which HEAD is attached.

This last step is what updates the branch name. If we had:

...--F--G--H   <-- master, topic (HEAD)

just a moment ago, well, now we have:

...--F--G--H   <-- master
            \
             I   <-- topic (HEAD)

instead. The current name is still topic, and HEAD is still attached there. New commit II stands in for some big ugly hash ID as usual—points back to existing commit H. The name topic now points to new commit I.

Suppose we now use git checkout master to go back to existing commit H. I'm going to move topic up above the master line and pretend we added one more commit, too, and rename it topic1:

             I--J   <-- topic1
            /
...--F--G--H   <-- master (HEAD)

What's in Git's index now, and in our work-tree, matches existing commit H. Let's make a new branch name topic2 now, and switch to it:

             I--J   <-- topic1
            /
...--F--G--H   <-- master, topic2 (HEAD)

What's in Git's index and our work-tree has not changed at all. No existing commit has changed (none can), and we're still working with commit H, but now any new commits we make will change where the name topic2 points. So if we make two commits now, we will get this:

             I--J   <-- topic1
            /
...--F--G--H   <-- master
            \
             K--L   <-- topic2 (HEAD)

Merging

Now that we have this fairly complicated setup, we can look at how git merge does its job. Let's say, for simplicity, that we'll choose to merge topic2 into topic1, and completely ignore master for a while. So we'll start by doing a git checkout topic1 or git switch topic1, and by not drawing master at all, to get:

             I--J   <-- topic1 (HEAD)
            /
...--F--G--H
            \
             K--L   <-- topic2

Now we'll run:

git merge topic2

Importantly, this kind of operation requires a true merge. I'll show one that doesn't in a moment. A true merge has not two but three inputs, all of which are commits:

  • Git finds one of them the easy way, using the name HEAD: that's our commit, or commit J. This is the --ours commit for various operations later, although internally this is commit #2 (this internal number can leak out in a few places but --ours lets us not have to remember it).

  • Git finds one based on the command we gave it. Since we said git merge topic2, Git uses branch name topic2 to find commit L. This is the --theirs commit for various operations later, although internally this is commit #3.

  • Git finds the third commit on its own. Git calls this the merge base commit, and internally, it's #1: if we find ourselves wanting to look at files from this commit we need to use this internal number.

The merge base is found using the Lowest Common Ancestor algorithm on a directed graph, but we can think of this as the best shared (common) ancestor on both branches. Here, that means we start at commit J and work backwards to I and then H. We also start at commit L and work backwards, to K and then H. Commits H and earlier are on both branches, but H is pretty clearly a better (or at least newer) ancestor than G, or anything earlier.

What Git does now is compare the snapshot in the merge base to each of the two branch-tip snapshots. That is, Git runs the equivalent of:

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed
git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

Merge's job is now to combine the changes, then apply these combined changes to the snapshot in H—the merge base:

  • If we changed some file, what exactly did we do to that file? If they changed the same file, what did they do?
  • If we changed a file and they didn't change a file, Git needs to make the same changes we made. But that means changing what's in H to match what's in J. That's easy: Git can just take our file from J.
  • If they changed a file and we didn't, Git can just take their copy.
  • If we deleted a file and they didn't do anything to it, Git can just take the deletion; the same holds if they deleted a file that we didn't do anything to.

If both we and they made some change(s) to some file, and those two changes collide—affect the same lines, for instance—then Git may have to declare a merge conflict. Here, things get a little more complicated:

  • Suppose that both we and they fixed the same spelling error of the same line of some file. Then our change and their change match, and Git can just take one copy of that change. So that's not a merge conflict after all.

  • Or, maybe we changed the word red to the word green, and they changed the same word to the word yellow. Here, Git doesn't know which change to take, and declares a merge conflict.

  • Perhaps we changed a file and they deleted the file. These conflict: Git doesn't know whether to keep our file, or delete the file entirely, so Git declares a merge conflict.

When Git does declare a merge conflict, Git goes ahead and does the remaining merge work wherever it can, but then has git merge itself stop in the middle. Otherwise—if Git thinks everything went well—it will by default go on to the next step. You can add --no-commit to your command line to tell Git to stop anyway.

This process, of finding and combining changes using three input commits, is what I like to call merge as a verb. It is the action of identifying (pairing up) input files and making diffs to see what changed, then combining and applying the combined diffs, using all three input files from the three commits.

If all goes well and you didn't tell Git not to, Git will go on to make a new commit. This new commit will be like any other commit, with one exception: instead of one parent, it will have two parents. We'll come back to this in a moment; for now, let's suppose that there's a conflict and the merge stopped, or you used --no-commit.

Conflicts reveal an important secret ... well, not really a secret, but sometimes not explained very well: the merge-as-a-verb process actually takes place in Git's index, because Git builds new commits from its index. Git does use your work-tree: when the merge stops with a conflict, Git will write, to your work-tree, its best effort at merging the various files. Those that have low-level conflicts6 will contain conflict markers. Git's index, meanwhile, has been expanded: it now holds not one but three copies of each input file. This is where those numbers mentioned earlier come in:

  • Index slot #1 holds the merge base copy of the file.
  • Index slot #2 holds the --ours copy of the file. You can use git checkout --ours to get this one out to your work-tree.
  • Index slot #3 holds the --theirs copy of the file. You can use git checkout --theirs to get this one out to your work-tree.

In high-level conflicts, one of these slots may be empty. I won't go into detail here as this answer is already quite long.

Note too that you can use git checkout -m to restore the conflicts to your work-tree copy. Be careful with any of these git checkout operations, as they will instantly overwrite any work you did to fix the merge conflict!

To resolve a merge conflict, you will in general edit the work-tree copy, or use a merge tool (git mergetool will run your chosen merge tool: Git itself does not come with any so this is strictly for third-party add-ons). Once you have the conflicts resolved correctly, you will usually run git add to tell Git: make your index copy match my work-tree copy. (The git mergetool command will run git add for you, although sometimes it asks first, depending on what third-party tool you use and what Git knows about it.) This git add wipes out the three numbered slots, and puts an entry in slot #0—whose number you don't normally see—so that there's just a single copy of the merged file in Git's index.


6A low-level conflict is one that takes place within some particular lines of a file. That's the "red became green on one side, but yellow on the other side" example above. A high-level conflict takes place across an entire file, such as when the --ours side makes a change to a file, but the --theirs side removes the file entirely. Both kinds of conflicts result in a paused merge, but a high-level conflict leaves no markers in your work-tree.


Merge as a noun or adjective: a merge or a merge commit

If there are no conflicts, or after you have resolved the conflicts and you run git merge --continue or git commit, Git is now ready to make a merge commit. This merge commit has a snapshot, just like any other commit. It has metadata, just like any other commit. The only thing that's special about it is that in this metadata, this new commit lists two parent commits.7 We can draw the new merge like this:

             I--J
            /    \
...--F--G--H      M   <-- topic1 (HEAD)
            \    /
             K--L   <-- topic2

Note that as usual, the branch name now points to new commit M. Commit M points back to existing commit J, just like any commit. What's special is that commit M also points back, through a second parent, to commit L: the one we chose by running git merge topic2.

Now that commit L can be found by working backwards from commit M, Git will allow us to delete the name topic1. The result looks like this:

             I--J
            /    \
...--F--G--H      M   <-- topic1 (HEAD)
            \    /
             K--L

If there's a branch name master that points to commit H, that branch name still points to commit H: the only things that change are what we tell Git to change, here.


7Technically, this is two or more. The "or more" part is to accommodate Git's so-called octopus merges. These don't do anything you can't do with ordinary two-parent merges, and we won't cover them here.


git merge commands that don't actually merge

In the above drawings, we had a sort of fork, where our two topic names pointed to commits that required working backwards to find a common ancestor. But suppose we start with master like this:

...--H   <-- master (HEAD)

and add a branch name feature and make a commit or two:

...--H   <-- master
      \
       I--J   <-- feature (HEAD)

and then run:

git checkout master
git merge feature       # or git merge --ff-only feature

This git merge command will go through the same initial process as our earlier git merge topic2 did, to find the merge base commit. This time, though, when we start at J and work backwards, we get to commit H, and commit H is our current commit. So when we start at master, we don't actually have to work backwards at all. Commit H, the merge base, is the current commit. In this case, git merge says to itself:

Hm, you know, if I compare commit H to itself, I won't find any changes at all. The result of combining nothing with something is always just the something. So I don't actually need to merge anything at all.

If we don't force Git to make a real merge, it just won't. Instead of merging, it will just check out commit J, but drag the name master forward in the process, so that we have:

...--H
      \
       I--J   <-- feature, master (HEAD)

(and now we can straighten out the kink in the drawing).

Git calls this a fast-forward operation. A fast-forward in general means move a branch label forward, against the direction of the internal commit arrows, so that the new position is a child of the current position. When git merge performs a fast-forward instead of a merge, Git calls it a fast-forward merge, even though no actual merging happened.8

You can prevent this with git merge --no-ff:

git merge --no-ff feature

will result in:

...--H------K   <-- master (HEAD)
      \    /
       I--J   <-- feature

where K is a new merge commit. The first parent of K will be H, and the second parent of K will be J. The snapshot in commit K will match the snapshot in commit J.


8The other operations that perform fast-forwards all the time are git fetch and git push: fetch will fast-forward your own remote-tracking names, and push will often only work if the operation is a fast-forward. When fast-forwarding is not possible for these two, they will use the "force flag", if it is enabled, to force the branch name motion.


Your own cases

I have the following branches:

  1. Master Branch (currently deployed to Prod)
  2. Enhancement #1 - CleanupUntrackedFiles Branch: This was spawned off of Master. I made updates to the gitignore file to not include particular extensions in my repo and I removed those unnecessary files from the repo.
  3. Enhancement #2: This was ALSO created off of Master. Code updates are in support of a customer request.

The phrase "created off master" suggests that you did:

git checkout master
git checkout -b CleanupUntrackedFiles
git checkout master
git checkout -b Enhancement2

or:

git branch CleanupUntrackedFiles master
git branch Enhancement2 master

As we've seen above, these will result in:

...--H   <-- master, CleanupUntrackedFiles, Enhancement2

It's the process of making new commits that will cause these two names to diverge, so that they no longer point to existing commit H.

We do not, however, know whether you did exactly the above, or whether you created these names at different times. For instance, perhaps you made the name CleanupUntrackedFiles when master pointed to some earlier commit, and the name Enhancement2 when master pointed to some later commit:

          I--J   <-- CleanupUntrackedFiles
         /
...--F--G--H   <-- master
            \
             K   <-- Enhancement2

Since git merge works based on commits, and can do fast-forward operations in some cases, these details matter.

Let's assume for now, though, that you have the former setup, i.e.:

       I--J   <-- CleanupUntrackedFiles
      /
...--H
      \
       K-----L   <-- Enhancement2

The difference between the snapshot in H and that in J is that in H, the .gitignore file doesn't list some files; in J it has some extra lines; and in J, files that you listed in .gitignore are not in the snapshot, so that comparing H vs J will show those files as deleted.

Meanwhile, the difference between snapshots H and L is that some source files are modified. We don't know, because you did not say, whether any of the files that show up as "deleted" in H-vs-J show up as modified in H-vs-L.

Let's say you now run:

git checkout master
git merge --no-ff CleanupUntrackedFiles

where the --no-ff option forces a true merge. You will get:

       I--J   <-- CleanupUntrackedFiles
      /    \
...--H------M   <-- master (HEAD)
      \
       K-----L   <-- Enhancement2

The snapshot in M will match that in J; the first parent of M will be H and the second will be J. If you now run:

git merge Enhancement2

(no --no-ff is required; if you use the option, it won't hurt, but it will not change anything) you will—assuming no conflicts—get:

       I--J   <-- CleanupUntrackedFiles
      /    \
...--H------M--N   <-- master (HEAD)
      \       /
       K-----L   <-- Enhancement2

where N is the result of comparing the merge base of M and L to the snapshots in M and L.

The hard part here is determining which commit is the best shared commit of M and L. We must work backwards from each, to find some commit that's on both master and Enhancement2. The steps when working back from Enhancement2 are simpler: L, then K, then H, and so on. From master, we look at M, then J and H (through M) and I (through J) and H (through I) in some order, but we do get to H. So H is once again the merge base.

Compare what's in snapshot H vs that in snapshot M. What has changed? The answer is: the same thing that changed in H vs J, because M's snapshot matches J's. So the .gitignore is changed, and some files are deleted.

Now, compare what's in snapshot H vs that in snapshot L. What has changed? Some code has changed, as you said. That's not .gitignore, so there is no conflict there. That might include the deleted files: if so, you'll have a modify/delete conflict to resolve, which you should do by keeping the delete. If it doesn't include the deleted files, there are no conflicts to resolve: Git will take the deletion, as well as the .gitignore change, from M, and add the other changes from L, to make N.

If you have some other setup, and/or use other options, work through the examples. That will tell you what git merge will do.

torek
  • 448,244
  • 59
  • 642
  • 775