How do I get my branch back to Original State in Eclipse IDE?

Question

So I have a feature that I am working on, so I created a branch let's call it Branch A. I have a pull request out for branch A and I am trying to merge it to main. I wanted to work on something else so I created a Branch B that's based on Branch A. I need to make some changes in Branch A based on comments I got, but somehow the changes I made in Branch B were reflected in Branch A. So, how can I get branch A back to its state while also preserving the work I did in Branch B? Or am I doomed to having to save my work elsewhere and just revert everything back? I haven't pushed any of my changes in Branch B to github.

What do you mean by `changes on branch b were reflected in branch a`? Did you merge branch B to A? Also what do you mean by `how can I get branch A back to its state`? What state? — kadewu, Sep 22 '22 at 00:39
I am not sure how, but when I switched back to branch A, all the changes I made in Branch B show up. At first I thought it was some eclipse glitch, so I tried to exit eclipse, update the project etc. By state I mean, how do I get branch A back to how it originally was before I created branch B. — StormChaser, Sep 22 '22 at 00:44
Are you sure the commits from branch B are on branch A? Or by changes you mean those from working area (the changes that are not commited or stashed will remain when switching branches unless there are conflicts) — kadewu, Sep 22 '22 at 00:55
I forgot to answer back, but no I did not merge Branch B to A. When I am on Branch A, and I do a git status, it shows all the files I made changes to in Branch B. I didn't commit any files from Branch B to github and I didn't stash them either. — StormChaser, Sep 22 '22 at 01:04
Actually, all I needed to do was stash my changes, thanks for the help. — StormChaser, Sep 22 '22 at 01:15

score 0 · Answer 1 · answered Sep 22 '22 at 13:20

You've been working with an incorrect mental model of how Git works. (This isn't surprising: a lot of people don't "get" the Git model immediately. When I first used Git, in 2006 or whatever year it was, I had the same issue.)

The trick is to realize that branches, in Git, are basically irrelevant. They are not completely useless: they have one very specific function. But other than this one specific function, they don't mean anything or even do anything. Instead, Git is all about commits—not branches, not files, but commits. Until you actually make a new commit, usually running git commit, you haven't actually done anything in Git!¹

When you said in a comment:

Actually, all I needed to do was stash my changes ...

this tells me that you used git branch or git switch -c or git checkout -b to create a new branch name, but you never ran git commit.

What git stash does is make two commits (or sometimes three). The commits that git stash makes are on no branch. Branches are not required, in Git. Only commits really matter.

It's very important to realize how this works. It's crucial, in fact, because if you don't know this, it's very easy to lose work you've done.

¹This is a slight overstatement for effect; it's possible to do things "in Git" without actually committing. But that's for later, after you've learned to commit early and often.

How commits work in Git

Commits are the reason Git exists. They're the basic building block. If you are using Git at all, commits are probably why you are using Git. (The only other reason is "because the boss told me to" or similar—basically the stuff made fun of in xkcd 1597.) As such, you need to know what a commit is and does for you.

Each Git commit:

Is numbered: it has a unique number that looks random (but isn't), and is extremely large and ugly and quite unsuitable for humans.
Is read-only. A commit, once made, can never be changed. This is required for the magic numbering scheme to work.
Contains two parts: some metadata, or information about the commit itself, such as the name and email address of the person who made it, and—indirectly—a full snapshot of every file.

This snapshot for each commit is stored in a special, magic, compressed and content-de-duplicated fashion, so that the Git repository—which consists of commits and their supporting objects—doesn't explode in size as you add more commits. Most commits mostly re-use most or all of the files from some previous commit(s), and when they do that, the content of those files is de-duplicated so that it's shared across all the commits that have it. (This is enabled by the read-only feature that's required to make the magical numbering system work. It's really all astonishingly self-referential, where one part of Git depends on another part of Git that depends on the first part, like an Ouroboros.)

The metadata for any given commit contains, as part of the commit, the raw hash IDs—the unique numbers—of that commit's parent commits. Most commits, which Git calls ordinary commits, contain exactly one parent hash ID. This forms a simple backwards chain, where each commit links to its (single) parent, which links backwards to its parent, and so on.

What all this means is that Git only needs to know one hash ID—the one for the latest commit—to be able to find all the commits.

To understand this, we need to back up a little bit and talk about the repository. The bulk of most Git repositories consists of a big key-value database that Git calls the objects database. Git finds things in this big database by their hash IDs. Since the hash ID for a commit is unique, if we know the commit's hash ID, Git can quickly extract the commit itself from this big objects database. But Git needs the hash ID to do this.²

Suppose we've memorized the hash ID of the latest commit. It has some big ugly hexadecimal expression, such as dda7228a83e2e9ff584bf6adbf55910565b41e14; we'd have to carry this around in our heads (or write it down on paper or a whiteboard or something) if we really had to memorize it. We feed this hash ID to Git and Git quickly finds the commit, in that big database. Let's call this commit H, for Hash, and draw it like this:

<-H

That backwards-pointing arrow sticking out of H represents the parent hash ID stored in H's metadata. This holds another hash ID (in this case 279ebd47614f182152fec046c0697037a4efbecd), which is the commit's parent, so Git can use that hash ID to find the earlier commit, the one that comes just before H. Let's call that commit G and draw it in:

        <-G <-H

Now, assuming G is also an ordinary commit,³ it too will have a single parent hash ID, which I've represented with that arrow sticking out of G. This points to yet another parent F:

... <-F <-G <-H

By following these arrows, one hop at a time, Git can find every commit. All we had to do was feed it the hash ID of the last commit H.

The problem with this is obvious: we have to memorize some random, ugly, impossible-for-humans hash ID. So what shall we do, to fix this problem?

²Note that there are maintenance commands that (slowly and painfully) trawl through the entire database to look for various issues. Such a command could find all the "latest" commits. However, this takes multiple minutes in any reasonably-large repository: far too slow to use for everyday work.

³I have been using hash IDs from the Git repository for Git, and if you look at 279ebd47614f182152fec046c0697037a4efbecd you'll find that it isn't an ordinary commit after all. But we're not going to cover that here.

Branch names

Here's a great idea: we have a computer. Let's have the computer memorize the latest hash ID. We'll use something that humans can work with, like a branch name. We'll just add a second database—another key-value store, in fact—right next to the big all-objects database. In this names database, we'll store names: branch names, tag names, and all sort of other names. Under each name we'll store just one hash ID.

(That single hash ID might seem kind of limiting, and it would be, but it's enough for Git. Just as a branch name need only remember the latest hash ID, a tag name need only remember one hash ID. Git uses annotated tag objects when desired here, to handle this. We won't cover those here either though.)

When you make a new branch name in Git, you're basically setting things up so that you can have more than one "latest" commit. That is, we start with one branch name, like master or main—which one you use doesn't matter to Git—and we have a series of a few commits, starting with one very special commit that Git calls a (or the) root commit, that has no parent:

A--B--C   <-- main

Here I've drawn a small repository with just three commits. Commit A is our special root commit, with no parent. Commit B is the second commit, and it points back to A; and commit C is the third and—so far—last commit, pointing back to C.

If we make a new commit now—never mind how just yet, just imagine that we make a new commit—Git will come up with a new, never-before-used hash ID,⁴ which I'll just call D. Git will make commit D by saving a full snapshot of every file—where these files come from is crucial but also surprising and we'll come back to that—and writing out appropriate metadata. The new commit's metadata will point back to existing commit C, because C is the latest commit at the time we make D. But then D, once made, is the latest commit, so Git simply stuffs D's hash ID into the name main, in the names database, and voila:

A--B--C--D   <-- main

We say that the branch name, main in this case, points to the last commit in the branch. That's actually a definition: whatever hash ID is stored in the name main, that is the last commit on the branch.

If we decide commit D is awful and we want to get rid of it, then, we just have Git store C's hash ID back into main, like this:

        D   ???
       /
A--B--C   <-- main

What happens to commit D? Nothing: it's still there, in the big database, just sitting around where it literally can't be found because the name main doesn't point to it any more.⁵ If you've memorized the hash ID—or written it down or something—you can feed it to Git and still see commit D, at least until the maintenance deletion (see footnote 5 again), but otherwise you won't ever see it.

Instead of erasing D, though, let's do something different. Let's start with:

A--B--C   <-- main

and make a new branch name such as develop. This, too, will point to commit C. All three commits are now on both branches.

A--B--C   <-- develop, main

To remember which branch name we're using to find commit C we have Git "attach" the special name HEAD to one of these two branch names. That's the current branch, which is the name that git status lists when it says on branch master or on branch develop:

A--B--C   <-- develop, main (HEAD)

If we now git switch develop, we switch from commit C to commit C—which doesn't do anything at all, as it's not switching commits—but we are now using C via the name develop:

A--B--C   <-- develop (HEAD), main

When we make our new commit D now, Git writes the new hash ID into the current branch name. Since that's develop, not main, develop now points to D. The other name, main, still points to C:

A--B--C   <-- main
       \
        D   <-- develop (HEAD)

In this way, we can make multiple branch names, each of which points to any existing commit. For instance we can go back to commit B and make a new name for that commit:

A--B   <-- old
    \
     C   <-- main
      \
       D   <-- develop (HEAD)

We can add and remove any branch name at any time, with the constraint that we're not allowed to delete the branch name we're "on", whatever that name is. So if I wanted to delete develop right now I'd have to run git switch main or git switch old.

⁴This hash ID has to be never-before-used in any repository anywhere in the universe, and has to be never-used-again either, and Git has to do this without contacting any other Git software or Git repository. How does this work? It's magic ... or, well, not really magic at all and someday it will break, but not for a long time, we hope.

⁵This is where the maintenance commands will come in later. They'll trawl through the entire database, discover D, discover that D can't be found, and erase it. Maybe, eventually. We don't know exactly when.

Your working tree and Git's index

I mentioned earlier that it's surprising what files Git uses to make a new commit. The reason for this is simple enough:

you can't see these files; and
other version control systems don't even have these files.

In other words, Git is peculiar here.

Where Git is normal is this: the files stored in any one given commit are all read-only. Not only that, they're in a format that the rest of your computer can't use. Nothing but Git can read these files, and not even Git itself can overwrite these files. But to get work done, on your computer, you need ordinary everyday files, that all programs can read and write. Almost all version control systems have this problem, and they almost all deal with it in the same way: the act of checking out a commit copies the files out of the saved snapshot. So Git does this same thing.

When you pick a commit, with git switch branch-name for instance, Git extracts that commit's files (unless of course you're not changing commits, in which case Git does nothing at all).⁶ The usable copies of these files go into a working area, which Git calls your working tree or work-tree. These are ordinary everyday files! You can see them. You can open them up in an editor or IDE. You can do anything you want with and to these files. These files are not in Git. They came out of Git, but they're just ordinary files now.

This is why kadewu asked:

Are you sure the commits from branch B are on branch A? Or by changes you mean those from working area ...

When you switched to a new branch A and made some commits, those were new commits. But then you switched to a new branch B and didn't commit. You modified working tree files, but were still on the same commit. Then you switched back to branch A ... which changed the name to which HEAD is attached but did not change commits, and did not change any files.

[when] I do a git status ...

Now we get to the sneaky thing Git does, when you check out some commit.

While Git is filling in your working tree with usable copies of each file, Git is also filling in a third copy of each file. This third copy sits, in effect, between the committed copy, in Git's special commit format, and the usable copy in your working tree. This intermediate copy of each file is in the de-duplicated format, but—unlike the files stored inside a commit—it's not quite read-only.⁷ Using git add, you can replace this copy.

This extra, intermediate copy of each file is in what Git calls, variously, its index, or the staging area, or—rarely these days—the cache. All three names are for the same thing. The fact that there are these three names mostly reflects that the original names were terrible. You can mostly ignore the name cache, it just pops up in flags like git rm --cached. I like the name index because it's meaningless, but the name staging area is useful because it reflects how you use the index.

When you run git commit, Git is going to take all the files that are in Git's index right then, and use those for the new commit. You can't see these files! They're in Git's index, which is invisible.⁸ If you've modified some working tree file, you must run git add on it.

What git add does is simple enough: it

reads the working tree copy;
compresses it into the special Git-only format;
checks to see if the contents are already there as a duplicate:
- if a duplicate, git add throws away the new compressed version and uses the old one;
- if not a duplicate, git add saves away the new compressed version and uses that;
in any case, git add updates the index entry so that the updated file is what will be committed.

Either way, before you ran git add, the file was already there, in Git's index, ready to be committed. After you run git add, the file is again there, in Git's index, ready to be committed—just, with different compressed and de-duplicated content.

So, whatever is in Git's index is always ready for commit. This is what makes git commit so (relatively) fast.

If you git add a new-to-Git file, Git still compresses the content as usual, but when it gets to writing the Git-ified object into Git's index, it goes into a new index entry, for the new file name. The index holds file names as full path names—path/to/file.ext, for instance—and internal Git blob object identifiers for the content. Note that Git uses forward slashes here even on Windows systems, where the OS stores this as file.ext in folder to in folder path as path\to\file.ext. Git has only files in the index, never any folders.⁹

Similarly, if you use git rm to remove a file, Git removes the file from both the working tree and the index. Without an index copy, the next git commit will store a full snapshot that omits the file. Relative to the previous commit, the new commit will thus "delete" the file.

What all this means is simple to remember: the index represents the next commit you plan to make. That's it—that's what the index is about! It's the next commit. It starts out being filled in from this commit. As you make changes in the working tree, nothing happens to Git's index yet. You must run git add (or git rm) to have Git update its index, based on updated you've made in your working tree.

As a short-cut, you can use git commit -a, but there's a flaw in this—well, more than one flaw, but some of them don't bite you until you have pre-commit hooks written by people who don't understand how Git complicates-up the index sometimes, including when you use git commit -a. The main flaw is that git commit -a is roughly equivalent to running git add -u, not git add --all. The -u option to git add only updates files that are already in Git's index. Any new files you made don't get added.

⁶Git's "don't change any files if not changing commits" falls out of a more general optimization it does, which is "don't change any files you don't have to change". We won't cover that here either, but note that switching from commit C to commit C, as we did earlier, doesn't switch out the underlying commit and therefore changes no files. So the optimization touches absolutely nothing at all in this case. That's why, for instance, you can create a new branch after you start changing files. Creating a new branch name uses the current commit, so it doesn't change the commit, and therefore does not need to change any files, and doesn't.

⁷Technically, the content in Git's index / staging-area is read-only, in the form of a Git internal blob object. What you get to do is overwrite it with another blob object.

⁸The git ls-files command can show what's in the index rather directly. But this command turns out to be of relatively little use: git status is the command to use after all.

⁹This is what leads to the problem of storing an empty folder, which Git can't really do well at all. If the index could hold a directory without the "keeps turning into a gitlink" bug-ette, Git could store empty directories via the empty tree. But it (the index) can't (store a directory), so it (Git) can't (store an empty folder).

Understanding `git status`, and a bit about `.gitignore`

I mentioned earlier that you can't see what's in Git's index / staging-area. Since Git makes a new commit from the files that are in Git's index, this is a problem! If you look at your working tree, what you see is not in Git and is not what is going to get committed. The stuff that will be committed is whatever's in Git's index, and you can't see that.

What you can do though is run git status. This command actually runs two comparisons. First, though, git status tells you the current branch name, saying (e.g.) on branch develop. That's very useful: that's the branch name that Git will use when it stores the new commit hash ID. You may then get some more information about the branch name, e.g., ahead and/or behind its upstream. We won't cover this here (for space reasons).

Next, Git does a comparison—a git diff --name-status, in effect—between the current commit, aka HEAD, and the index. Usually almost all the files here are unchanged. For those files, git status says nothing at all. So for most files you get no output at all, which is really easy to read. You get output only for those files where something is different!

That means that this section lists changes staged for commit, and that's what this section is titled, Changes staged for commit. Any file names printed here are being printed because this file is different, in the index, than it is in the HEAD commit. Maybe it's entirely new! Maybe it's been deleted! Maybe it's just changed. It's definitely different though.

Having listed out these "staged for commit" changes—or said nothing at all, if Git's index still matches the HEAD commit—the git status command now moves on to its second comparison. It basically runs another git diff, also with --name-status to avoid showing the changed lines, to find out which files, if any, are different in Git's index and in your working tree.

If some working tree file is different from the index copy of that same file, git status will list that file here. These go in the Changes not staged for commit section of the git status output. If you didn't touch 999 out of 1000 files, only one file will be listed here: the one you did touch. And as soon as you use git add on that one changed file, the index copy will match the working tree copy and it will stop being "not staged". But now the index copy probably won't match the HEAD copy any more, and it will instead start being "staged".

So:

the first diff tells you about files that are staged for commit;
the second diff tells you about files that aren't but could be staged.

Both of these sets of files are discovered by comparing the contents of each of the various copies. First Git compares HEAD-file-contents to index-file-contents, to get the "staged for commit" list. Then Git compares index-file-contents to working-tree-file contents, to get the "not staged for commit" list.

And it's that simple ... well, almost. Of course Git has to throw in an extra wrinkle here.

If you add, to the index, an all-new file, Git will say that there's a new file added and staged for commit. That makes sense. But what if you add an all-new file to your working tree? You might expect Git to say there's a new file added, but not staged for commit.

But no! Instead, Git tells you that there's an untracked file. What is that all about? Well, sometimes this new file should be git add-ed. Then it becomes a tracked file, and it will go into the next commit.

Sometimes, though—especially in some programming languages—you get a whole bunch of files that should never be committed at all. For C and C++ code, for instance, you get .o (object code) files. For Python, you get .pyc or similar files, sometimes in a subdirectory (Python 3). None of these should ever be committed.¹⁰

If Git complained about all of these files, that would be extremely annoying. So you can get Git to shut up about certain untracked files by listing those file names, or patterns, in a .gitignore file. Listing an untracked file in .gitignore makes git status shut up about it. That's the main purpose, really.

Now, listing such untracked files also has some secondary effects. In particular, you can now use en-masse git add . operations to add all files, including new files, without adding these untracked-but-ignored, quietly-not-complained-about should-never-be-committed files.

What you need to know most of all here, though, is this: If a file is tracked, it cannot be ignored. Listing a tracked file in a .gitignore has no effect. Fortunately, tracked has a simple definition: a file is tracked if and only if it is in Git's index right now.

We know that we can remove files from Git's index, using git rm (removes both working tree and index copy) or git rm --cached (removes index copy only). Once we remove such a file, it's untracked (and maybe gone entirely, if we forgot to use --cached).

But we can't change any existing commit. If a file that should never have gotten into Git did get into some existing commit, it's stuck there forever. As long as we have that commit, if we check out that commit, Git will copy the file into Git's index (and our working tree) and it will be tracked right then. We'll need to remove it again, every time, to untrack it. The only way to fix this is to stop using that commit entirely.

It's therefore important to make sure that files that should be untracked stay that way: never get committed, in any commit, and hence never sneak into Git's index through the basic check-out-a-commit action. If you make a bad commit, that has some files in it that shouldn't, try to avoid passing that commit around. Get rid of it before it contaminates other Git repositories. We won't cover how to do that here, but eventually you will probably need to learn this, because this happens (a lot!).

¹⁰There are occasions when "build artifacts" need to be archived. It's generally unwise to put them into Git though, as Git's algorithms tend to fall apart when dealing with large binary files, especially compressed ones.

How do I get my branch back to Original State in Eclipse IDE?

1 Answers1

How commits work in Git

Branch names

Your working tree and Git's index

Understanding git status, and a bit about .gitignore

Understanding `git status`, and a bit about `.gitignore`