2

I have the following branch structure

-localproject
  -master
  -taskA
  -taskB
-remotes/origin
  -master
  -taskA
  -taskB

Here is what I want to achieve:

  1. The whole thing is for an assignment, so taskA and taskB are parts of the assignment that I need to push to the corresponding remote branch to get tested automatically. So initially I pull from master, which is the provided framework and create branch taskA.
  2. I do my work in taskA and as soon as it is done, I push to the corresponding remote branch. It gets evaluated by the remote test system (gitlab CI / CD pipeline)
  3. Perfect, taskA is working! Now I want to start taskB. This new task is based upon taskA so I create the new branch from it and start working.
  4. Now working in taskB I realize that I could some things in my code to improve taskA. Let's say some files changed in taskB should now be committed to taskA.

The way I would do this now: For each file, where I have changes I want to commit in taskA:

  • Manually copy the code while in taskB
  • checkout taskA
  • paste the code and commit the changes.

Is there a better way to do this? Can I somehow check files in taskB and copy/commit them over to taskA directly?

I hope I explained as clear as possible (at least I tried), however ask away if some things are not described in the best manner. While I generally understand git, I mostly use it for my private projects where not a lot of branching or anything out of the ordinary is involved...

Roland Deschain
  • 2,211
  • 19
  • 50

1 Answers1

2

There are many different ways to manage this kind of work-flow in Git, because Git is a set of tools, rather than a particular solution.

When working with Git, keep all of these things in mind:

  • Git is really all about commits. Commits (like all Git internal objects) are completely immutable once created.

  • Commits store snapshots of your files: not changes, just snapshots. In some ways this doesn't really matter, but it makes it a lot easier to understand some weird Git corners. Since the commits are immutable, so are the stored files. These stored files are also compressed (sometimes in very fancy way) and de-duplicated, and not usable by anything other than Git itself. They have to be extracted from a commit for you to use them (see below).

  • Each commit has a parent commit, or for merge commits, two (or potentially more) parents. These are stored by hash IDs.

  • The hash ID of each commit is how Git actually retrieves the commit, so hash IDs are crucial here. But hash IDs look random and are completely unsuitable for human use, therefore ...

  • Git gives us branch names, which simply store the hash ID of the last commit that we'd like Git to think of as being "on the branch".

A Git repository, then, is essentially just two databases. One contains all the commits and supporting internal objects—all indexed by hash ID—and the other contains the name-to-hash-ID mapping. The files that you work with, as you do your actual work, aren't in Git at all. That is, they're not the ones that are in the repository database. They are just in an auxiliary area.

This work area, where you work on files, is your working tree or work-tree. If you use git clone to make your repository, Git creates the work-tree for you. If you build your own work-tree before you create a .git with git init, you have already set up the work-tree, and the git init step just creates a new empty repository—two empty databases, one for commits and the other for names.

When you check out some particular commit—whether that's by an explicit check out historical commit so I can view it or implied by check out some branch by name so that I can do new work—Git will extract the saved snapshot into your work-tree. But apart from this and other Git commands that explicitly tell Git do something with my work-tree file(s), these files are now yours to do with as you will.

There's one more wrinkle here. When you go to make a new commit, Git does not use the files in your work-tree. Instead, Git uses hidden copies—technically these aren't exactly copies, but it works to think of them like that—of the frozen-format files that get stored in commits. These "copies" live in what Git calls, variously, the index, or the staging area, or—rarely these days—the cache.

The index has multiple roles, but you can think of it as containing the proposed next snapshot. It starts out matching the current snapshot, as extracted into your work-tree. The git diff --staged and git status commands don't show you the index copies of the files because they're the same as the snapshot's copy of the files. When you use git add, you're telling Git: Copy my work-tree copy of the file back into the index, replacing the old copy, or putting an all-new file into the index. Now that the index copy doesn't match the current-commit copy, git status and git diff --cached will show you that file. Change it back—to match the committed copy—and they'll stop showing the file again.

Since your work-tree is yours, you can create files that never get into Git at all. These are your untracked files. To help ensure you don't accidentally put the untracked files into the repository, you can list files or name-patterns in .gitignore. Note that once a file is tracked, listing it in .gitignore has no effect.

A file in your work-tree is tracked if and only if that file exists right now in Git's index. While this definition is very short and simple, it has a long shadow: Since Git fills in its index from a commit—via git checkout or git switch—this means that a file that exists in commit X but not in commit Y can switch from being tracked to untracked, or vice versa, just by changing which commit you have checked out. You can also modify, create, or remove specific files within the index yourself, with git add and git rm. Whenever you do this, you're changing the proposed next commit. None of this has any effect until you actually run git commit.

With the above in mind, we're ready to tackle your particular case

Let's jump right to step 3:

Perfect, taskA is working! Now I want to start taskB. This new task is based upon taskA so I create the new branch from it and start working.

Since commits refer back to earlier commits, let's draw this. Suppose the hash ID of the last commit in your taskA branch (which now exists) is H, where H stands in for the real Git hash ID. Then the name taskA is a way for Git to remember hash ID H for you. Commit H itself has a parent, with another big ugly hash ID, but we'll call that parent G. G has a parent, too, which we'll call F, and so on:

... <-F <-G <-H   <--taskA

The name taskA selects commit H (for now). The name taskB does not even exist yet.

Now you create taskB. I'm going to switch from drawing the internal arrows, which point backwards from each commit, to lines because the arrow drawing character set for posting here on StackOverflow is poor, but this just adds another name, taskB, that also selects commit H:

...--F--G--H   <-- taskA, taskB (HEAD)

We now need to know which name we're using, as well as which commit, so we'll attach the special name HEAD to one of these two branch names.

Now working in taskB I realize that I could some things in my code to improve taskA. Let's say some files changed in taskB should now be committed to taskA.

This is where you suddenly get a lot of options.

My favorite one for descriptive purposes is git worktree, specifically git worktree add. But git worktree was new in Git 2.5, and had a nasty bug finally fixed in 2.15, so unless your Git is reasonably modern, you might want to avoid it. It's also going to create a little bit of extra work for you, if you go this way, but it's a very general solution.

What git worktree add does is let you add a second work-tree to your existing repository. Each added work-tree gets:

  • its own HEAD, so that it can (and in fact must) have a different branch checked out;
  • its own index, i.e., proposed next commit; and
  • of course, its own work-tree full of files.

So you can use git worktree add to make two independent work areas, each of which is "on" a different branch. You can then just take this moment to:

  • open a new window on (or push directories to) the work-tree in which you're working on taskA;
  • modify the files there, however you like—up to and including copying them from the work-tree where you are working on taskB—and git add and git commit.

Let's say you do make a new commit in this added work-tree. We can draw that. We start with this:

...--F--G--H   <-- taskA (HEAD), taskB

and modify some files, git add, and run git commit. This makes a new commit—which gets a new big ugly hash ID; we'll call it I—and makes the name taskA point to this new commit:

             I   <-- taskA (HEAD)
            /
...--F--G--H   <-- taskB

If we switch back to the taskB window / work-tree, where taskB is the HEAD, we have:

             I   <-- taskA
            /
...--F--G--H   <-- taskB (HEAD)

The files in the work-tree here—the original one, not the added one—match those of commit H, except for any changes you've made so far. The files in the index for this work-tree match those for commit H. Any new commit you make now will update the name taskB like this:

             I   <-- taskA
            /
...--F--G--H
            \
             J   <-- taskB (HEAD)

Again, the new snapshot comes from the index. The commits have not changed: we've merely added some new ones. The parent of new commit I is existing commit H. The parent of new commit J is existing commit H. Commits up through and including H are on both branches, but the branches now diverge.

What if you don't want to use git worktree add

Remember that Git makes each new commit from the index, not from your work-tree. Suppose we have:

...--F--G--H   <-- taskA, taskB (HEAD)

but the index matches commit H. Git will let you switch back to taskA without disturbing the index and work-tree content at all. (This is not always true, but it is true given our suppositions and setup here. For the gory details, see Checkout another branch when there are uncommitted changes on the current branch.) So let's say we do that:

git checkout taskA              # or git switch taskA

...--F--G--H   <-- taskA (HEAD), taskB

Now we just git add the one or two files you would like to be different—which copies them into the index, ready for the next commit—and then run git commit. Since we're using the index files, not the work-tree files, updated you made but did not git add do not go into the new snapshot.

We get:

             I   <-- taskA (HEAD)
            /
...--F--G--H   <-- taskB

exactly as before. The name taskB does not move.

When we now git checkout taskB, Git sees that the files we just updated are different in commits H and I. So Git will copy H's copy of those files out of the commit, into Git's index (so that they match H) and your work-tree, and the changes you just made for taskA are gone. But we can bring them back into the work-tree:

git checkout taskA -- file1 file2

or (since Git 2.23):

git restore -s taskA -i -w file1 file2

which tells Git: reach into the commit identified by the name taskA—commit I—and pull out these two files and copy them into the index and my work-tree. So now you're back to having the updated files, along with all the other undisturbed files. The updated files are already changes staged for commit, as git status will say, as they're in the proposed next commit in the index.

You can now finish up the stuff you were doing, git add, and git commit as needed, giving:

             I   <-- taskA
            /
...--F--G--H
            \
             J   <-- taskB (HEAD)

exactly as before.

You may now want to rebase taskB

However you got to this point, you now have taskB extending from commit H, rather than from commit I as you might wish.

No commit can ever be changed, but any commit can be copied. What if we copy commit J to a new-and-improved commit—let's call it J'—where the snapshot in J' matches the snapshot in J, plus any changes from H-to-I if needed? (They're already in J so they are not needed, but Git would put them in if they were.)

We can get this by using git cherry-pick. We first create a new temporary branch temp, pointing to commit I:

             I   <-- taskA, temp (HEAD)
            /
...--F--G--H
            \
             J   <-- taskB

Now we tell Git: copy commit J to where we are now:

git cherry-pick taskB

which produces:

               J'  <-- temp (HEAD)
              /
             I   <-- taskA
            /
...--F--G--H
            \
             J   <-- taskB

Note that, yet again, we have not changed any existing commit at all. We have just added a new commit J' whose parent is I.

Now that we have copied all the taskB commits (all one of them) to new-and-improved commits, we just need to tell Git: Take that name taskB and move it in a way such that we'll forget all about the old commit J. Specifically, force taskB to point to the current commit. We do this with:

git branch -f taskB HEAD

which results in:

               J'  <-- taskB, temp (HEAD)
              /
             I   <-- taskA
            /
...--F--G--H
            \
             J   ???

Note that there is now no name by which to find existing commit J. So when you have Git list out the commits it can find by branch names, commit J does not show up at all. A new and different hash ID—that of J'—does. Now we just switch back to branch taskB and delete the temporary name and we have:

               J'  <-- taskB (HEAD)
              /
             I   <-- taskA
            /
...--F--G--H

as if we had been clever enough to make commit I first all along.

We don't need to use four separate Git commands—create temp branch name, cherry-pick commits to make new-and-improved-copies, forcibly move old branch name, delete temp branch name—because git rebase does that for us. That's what git rebase is really about.

The one drawback to rebasing: if someone else has the commit(s)

You did not mention doing a git push -u origin taskB above, but if you had done that, you would have sent a request to another Git, the one over at origin, to take any new commits you have that they don't, that they need, and then to create, in their Git repository, their branch name taskB, pointing to whichever commit your name taskB points to.

When you use git rebase you tell your Git: Copy some commits to a new place, then throw out the old commits in favor of the new-and-improved copies. Your Git obeys. If you now have your Git ask their Git to update their taskB name:

git push origin taskB

they will sometimes say no! In particular, they will see whether this action will drop some commit(s) from their taskB. If that's the case, they will reject the push with the error non-fast-forward. But of course that's just what you would want in this case: you made some commits, then you made new-and-improved commits and they should lose the old ones. To get them to do that, you will need a more forceful git push.

Whenever a branch gets rebased regularly, all users of any shared Git repository should be aware of this. That's because ... well, suppose Alice pushes a commit. Then Bob gets Alice's commit from the shared repository, and starts building his own additional commits. Then Alice changes her mind and rebases, throwing out the old commits in favor of new-and-improved ones. But Bob still has, and has based his commits on, the old ones! Alice and Bob are in effect fighting over which commits are the good ones.

This is not all that hard to deal with technically, usually. For instance, here, Bob just needs to rebase his commits on Alice's new ones, dropping Alice's old ones. If everyone agrees in advance that this sort of thing happens, and knows how to deal with it, that's no problem.

If the origin repository is private (so there are not separate users Alice and Bob), or your branch on origin is private (same condition), or everyone agrees that rebasing happens (Alice and Bob are both ready to check these things), there is no problem here. Just be aware of the pitfalls. Consider using git push --force-with-lease as a safety check, too.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Wow, thanks a lot for this extensive explanation, it helps quite a lot to understand git in general and answer my initial question! – Roland Deschain Jun 03 '20 at 08:49