0

I've been using git and was able to create a branch and push the origin. I have very little basic understanding but still learning.

Today I was working on a branch lets say called B and in parallel, but sometimes I was doing some debug branch folder A but without switching between branches just working on the files and saving them to drive.

So I wanted to switch back to branch Ato push the changes to git so I did

git checkout A

error: The following untracked working tree files would be overwritten by checkout: cc.py dd.py .... some other files did not really understand why I got this error because my branch was B and those files that below the error belong to the branch -A folder. Anyways I did

git checkout -f A

Switched to branch 'A' Your branch is up to date with 'origin/A'.

how could this happen? I have updated files in branch A locally but its saying you are up the date??

Then I did

git status

there is no file to commit. Everything is up the date. So then I thought if I fetch the remote version of this branch and it will recognize the differences between the local version and the remote version of the branch A

then I did

git remote update

Fetching origin
remote: Enumerating objects: 27, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 14 (delta 11), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (14/14), 1.76 KiB | 39.00 KiB/s, done.

did

git fetch origin A
  • branch A -> FETCH_HEAD

basically whatever I tried I could not get the changed file status to appear red in my local repository branch A. So I tried to fetch from remote to get the differences between the local and remote versions of the A branch. That is also failed.

I'm really stuck on why this has happened and really looking for help to resolve this! Thanks

What is the difference between 'git pull' and 'git fetch'?

Alexander
  • 4,527
  • 5
  • 51
  • 98
  • 1
    (Note: this isn't an answer, just generic advice:) Don't name branches with `-` as the first character of their names, it's a bad idea. That's not the actual problem and you can always refer to them as `refs/heads/-A` for instance to avoid starting with `-`, but really, don't do it, it makes life miserable because the branch name looks like an option to the Git command. – torek Oct 26 '21 at 22:52
  • @torek as you mentioned `-A` rep name is just made it up name. I edited the OP to avoid to confusion. Its not the actual problem asked in OP. – Alexander Oct 26 '21 at 23:20

1 Answers1

0

TL;DR

Switching branches can require changing out the contents of Git's index and your working tree. This can lose work you're doing. You've run into such a case. In general, you must force Git to lose work (though the old git checkout command has some minor issues that make it too easy to destroy unsaved work, fixed in the new git switch).

There is a lot to know here.

Long

You're mixing together a number of concepts that, when you use Git, you need to keep separate in your head. In particular it looks like you've been given a bad introduction to Git. A good one will start with this:

  • Git is about commits.

  • Commits contain files, but Git is not about files. Git is about commits.

  • Branches—or more precisely, branch names—help you and Git find commits, but Git isn't about branches either.

So Git is basically just a big database full of commits (and other supporting objects, and there are some smaller databases along side of this). The commits are the raison d'être for Git.

As we all know, what someone tells you three times is true, so the next thing to learn is what a commit is. It's a bit abstract: it's hard to point to something in the room and say there, that's a commit! because there is no real-world analogue. But in Git:

  • Each commit is numbered, with a unique number that looks like random garbage. It's actually a cryptographic checksum (reminiscent of cryptocurrency, and there is actually a relationship here), expressed in hexadecimal, but we can just think of it as an apparently-random string of junk characters that no human is ever going to remember. It is, however, unique to that one particular commit: once a number has been used by any one commit, nobody anywhere can ever use it for any other commit.1

    This is how two different Gits—two pieces of software that implement Git, working with two different repositories—can tell whether they both have some commit. They just look at each other's commit numbers. If the numbers are the same, the commits are the same. If not, the commits are different. So in a sense, the number is the commit, except that the number is just a hash of the commit and if you don't have the number, you need to get the whole commit (from someone who does have it).

  • Meanwhile, each commit stores two things:

    • Every commit has a full snapshot of every file. More precisely, each commit has a full snapshot of all the files it has. That sounds redundant, but commit a123456 might have ten files, and commit b789abc might have 20 files, so obviously some commit might have more files than another. The point of this is to note that as long as you have the commit, you have a full snapshot of all the files, just like an archive.

      The files inside a commit are stored in a special Git-only form. They're compressed and—even more important—de-duplicated. This keeps the repository from getting enormously fat: most commits mostly re-use the files from some previous commit, but when they do that, the files are all de-duplicated, so that the new commit takes hardly any space. Only truly-different files need to go in; same-as-before files just get re-used.

    • Besides the snapshot, each commit has some metadata. Metadata is just information about the commit itself. This includes things like the name of the person who made the commit. It includes some date-and-time stamps: when they made the commit. It includes a log message where they say why they made the commit.

      Crucial for Git itself, Git adds into this metadata a list of commit numbers—"hash IDs" or "object IDs" (OIDs)—of previous commits.

Most commits store exactly one hash ID, for the (singular) previous or parent commit. This forms commits into chains. These chains work backwards, and there's a strong reason for that.


1This total uniqueness idea is true in practice, but not in theory, but that's OK as long as it's true in practice. To make it work in practice, the numbers need to be as huge as they are—or soon, huger, and the Git folks are working on making them even bigger now.


All parts of every commit are read-only

To make the commit numbers—the cryptographic hash IDs—work, Git needs to ensure that no part of any commit can ever change. In fact, you can take a commit out of the Git all-commits database and do stuff with it to change the contents or metadata and put that back, but when you do, you just get a new and different commit with a new unique hash ID. The old commit remains in the database under the old ID.

So a commit is this two-part thing—snapshot and metadata—that is read-only and more or less permanent. All you really ever do with Git is add more commits. You literally can't take any out,2 but it's very easy to add new ones, because that's what Git is built to do.


2You can, however, stop using a commit, and if a commit is not only unused but also un-findable, Git will eventually realize that this commit is trash, and will discard it. So that's how you get rid of commits, if needed: you just make sure they can't be found, and Git eventually—it takes a while!—throws them away. We won't cover this in detail here though.


Let's talk a bit more about parents and the backwards-chain thing

Although this isn't relevant to what you're doing right now, it's really important, so let's look at how commit chains work. We already said that most commits record the raw hash ID of one earlier commit. We also said tht the hash IDs are big and ugly and impossible for humans (which is true: what does e9e5ba39a78c8f5057262d49e261b42a8660d5b9 mean anyway?). So let's suppose we have a small repository with a few commits, but instead of their real hash IDs, let's use single uppercase letters to stand in for these commits.

We'll start with a repository that has just three commits, which we'll call A, B, and C. C will be the latest commit. Let's draw it in:

      <-C

C contains the raw hash ID of earlier commit B. We like to draw these as arrows coming out of the commit, and say that C points to B. Let's draw B in too now:

  <-B <-C

Of course B has one of these arrows, pointing to earlier commit A:

A <-B <-C

That's our full chain of commits. A, being the very first commit, doesn't point to anything earlier because it can't, so the chain stops here.

To add a new commit, we tell Git to do something with commit C—we'll describe this more completely in a moment—and then use C to make the new commit, which will then point back to C:

A <-B <-C <-D

Now we have four commits in our chain, with new commit D pointing back to C.

Besides these backwards arrows, each commit has a full snapshot. When we made D, we presumably changed some files—again, we'll get to this more in a moment—so some of the files in D are different from those in C. We presumably left some files alone. We can now ask Git to show us what changed in D.

To do that, Git extracts both C and D to a temporary area (in memory) and checks the contained files. When they match, it says nothing at all. The de-duplication that Git does makes this test easy and Git can actually skip over the extraction entirely for these files. Only for the files that are different does Git actually have to extract them. Then it compares them, playing a sort of game of Spot the Difference, and tells us what is different in those changed files. That's a git diff, and it's also what we see from git log -p or git show.

When we run git show on one commit, Git:

  • prints the metadata, or some selected parts of it, with some formatting; and
  • runs this sort of diff to see what's different between the parent of this commit, and this commit.

When we run git log, Git:

  • starts at the last commit D;
  • shows us that commit, perhaps with a git show style diff too if we use -p; then
  • moves back one hop to the previous commit, C, and repeats.

This process stops only when we get tired of looking at the git log output, or Git runs out of commits by reaching the very first one (A).

Finding commits

Let's draw in a few more commits. I'm going to get lazy about the internal arrows between commits: they're part of each commit, and hence can't change, so we know they always point backwards. I'll end my chain with hash H here:

...--F--G--H

Once we have a lot of commits—more than the eight or so implied by this—it's going to be hard to figure out which random-looking hash ID H actually has. We need a fast way to find the hash, H.

Git's answer to this is to use a branch name. A branch name is just any old name that meets the name restrictions. The name contains one hash ID, such as that for commit H.

Given a name that contains the hash ID of commit H, we say that this name points to H, and draw it in:

...--G--H   <-- main

We can, if we wish, have more than one name that points to commit H:

...--G--H   <-- develop, main

We now need a way to know which name we are using. To do that, Git attaches one very special name, HEAD, written in all uppercase like this, to just one branch name. The name that has HEAD attached to it is the current branch, and the commit to which that branch name points is the current commit. So with:

...--G--H   <-- develop, main (HEAD)

we are on branch main, as git status will say, and we're using the commit whose hash ID is H. If we run:

git switch develop

as a Git command, that tells Git that we should stop using the name main and start using the name develop instead:

...--G--H   <-- develop (HEAD), main

When we do this, we move from commit H to ... commit H. We don't actually go anywhere. This is a special case, and Git makes sure not to do anything but change where HEAD is attached.

Now that we're "on" branch develop, let's make a new commit. We won't talk much about how we do this just yet, but we'll come back to that, since that's at the heart of your current problems.

Anyway, we'll draw in our new commit I, which will point back to existing commit H. Git knows that the parent for I should be H because, when we start, the name develop selects commit H, so that H is the current commit at the time we start the whole "make new commit" process. The end result is this:

          I   <-- develop (HEAD)
         /
...--G--H   <-- main

That is, the name develop now selects commit I, not commit H. The other branch name(s) in the repository have not moved: they still select whatever commits they did before. But now develop means commit I.

If we make yet another commit, we get:

          I--J   <-- develop (HEAD)
         /
...--G--H   <-- main

That is, the name develop now selects commit J.

If we now run git switch main or git checkout main—both do the same thing—Git will remove all the files that go with J (they're safely stored forever in J though) and extract all the files that go with H:

          I--J   <-- develop
         /
...--G--H   <-- main (HEAD)

We're now on branch main and we have the files from H again. We can now make another new branch name, if we like, such as feature, and get on that branch:

          I--J   <-- develop
         /
...--G--H   <-- feature (HEAD), main

Note how commits up through and including H are on all three branches, while commits I-J are only on develop. As we make new commits:

          I--J   <-- develop
         /
...--G--H   <-- main
         \
          K--L   <-- feature (HEAD)

the current branch name moves forwards, to accommodate the new commits, and the new commits are only on the current branch. We can change that by moving branch names around: the names move, even though the commits themselves are carved in stone.

Commits are read-only, so how do we edit files?

We now come to the central parts of your problem. We don't—in fact, we can't—work directly with commits, because they are in this weird Git-only format. We have to get Git to extract the commits. We've already seen that git checkout or git switch can do this, but it's time for the full picture.

In order to get new work done, Git provides for you what Git calls a working tree or work-tree. This is a directory (or folder, if you prefer that term) that contains ordinary files, in your computer's ordinary file formats. These files are not in Git. Some of them come out of Git, to be sure: the git checkout or git switch process fills in your working tree. But it does that by this process:

  • First, if you have some existing commit checked out, Git needs to remove all the files that came out of that commit.
  • Then, since you are moving to some other commit, Git now needs to create (fresh) the files that are stored in that commit.

So Git removes the old files and puts in the new ones, according to the difference between the two commits.

But your working tree is an ordinary directory / folder. This means you can create files here, or change the contents of files here, without Git having any control or influence over this process. Some files you create will be all-new: they aren't in Git, they did not come out of Git, Git has never seen them. Other files might actually be in some old commit from long ago, but didn't come out of this commit. Some files did come out of this commit.

When you use git status, Git needs to compare what's in your working tree with something. Now the process gets a little bit complicated, because Git doesn't actually make new commits from the files in your working tree.3 Instead, Git keeps yet another copy of all the files.

Remember that the committed files—the ones in the current or HEAD commit—are read-only, and in a Git-ified, de-duplicated format that only Git itself can read. So Git extracted those files into ordinary files, leaving you with two copies of each file:

  • the Git-only read-only one in the commit, and
  • the one in your working tree.

But in fact, Git sneakily stuck a copy in between these two copies, so that you have three copies of each file:

  • there's the Git-ified one in HEAD, which can't be changed;
  • there's a Git-ified ready to commit copy in the intermediate spot; and
  • there's a usable copy in your working tree.

So if you have some files like README.md and main.py, you actually have three copies of each. That middle one is in a place that Git calls, variously, the index, or the staging area, or the cache. There are three names for this thing, perhaps because index is such a poor name, and cache is not good either. The term staging area is perhaps the best term, but I'll use index here because it's shorter and meaningless, and sometimes meaningless is good.

Our three copies of the file, then, are:

  HEAD        index       work-tree
---------    ---------    ---------
README.md    README.md    README.md
main.py      main.py      main.py

The files that are in Git's index are the ones that Git will commit. Hence, what I like to say is that Git's index is your proposed next commit.

When Git first extracts a commit, Git fills in both its index and your working tree. The files in Git's index are pre-compressed and pre-de-duplicated. Since they came out of a commit, they're all automatically duplicates, and therefore take no space.4 The ones in your working tree do take space, but you need those because you have to have them de-Git-ified to use them.

As you modify files in your working tree, nothing else happens: Git's index is unchanged. The commit itself is of course unchanged: it literally can't be changed. But nothing has happened to the files in the index either.

Once you've made some changes and want those changes to be committed, you have to tell Git: Hey, Git, kick the old version of the file out of the index. Read my working tree version of main.py because I changed it! Compress it down into your internal compressed format now! You do this with git add main.py. Git reads and compresses the file, and checks to see if the result is a duplicate.

If the result is a duplicate, Git kicks out the current main.py and uses the new duplicate. If the result isn't a duplicate, saves the compressed file so that it's ready to be committed, then does the same thing: kicks out the current main.py and puts in the now-de-duplicated (but first time occurring) copy of the file. So either way, the index is now updated and ready to go.

Hence, the index is always ready to commit. If you modify some existing file, you must git add: this compresses, de-duplicates, and readies-for-commit by updating the index. If you create an all-new file, you must git add: this compresses, de-duplicates, and readies-for-commit. By updating Git's index, you get the files ready for commit.

This is also how you remove a file. It remains in the current commit, but if you use git rm, Git will remove both the index copy and the working tree copy:

git rm main.py

produces:

  HEAD        index       work-tree
---------    ---------    ---------
README.md    README.md    README.md
main.py

The next commit you make won't have a main.py.


3This is actually pretty weird: most non-Git version control systems do use your working tree to hold the proposed next commit.

4The index entries themselves take a bit of space, typically around or a bit under 100 bytes per file, to hold the file name, internal Git hash ID, and other useful stuff that makes Git fast.


Now we see how git commit works

When you run git commit, Git:

  • collects any needed metadata, such as user.name and user.email from git config, and a log message to go into the new commit;
  • the current commit's hash ID is the parent for the new commit;
  • whatever's in Git's index is the snapshot, so Git freezes the index into a new snapshot; and
  • Git writes out the snapshot and metadata, which obtains the new commit's hash ID.

We don't know what the hash ID will be until you run git commit, since part of what goes into the metadata is the current date and time at that point, and we don't know when you'll make that commit. So we never know what any future commit hash ID will be. But we do know, because they're all set in stone, what all the past commit hash IDs are.

So now Git can write out commit I:

          I
         /
...--G--H   <-- develop (HEAD), main

and once Git has written it out and gotten the hash ID, Git can stuff that hash ID into the branch name develop, since that's where HEAD is attached:

          I   <-- develop (HEAD)
         /
...--G--H   <-- main

and that's how our branch grows.

The index, or staging area, determines what goes into the next commit. Your working tree lets you edit files so that you can git add them into Git's index. The checkout or switch command erases from the index the current commit's files, and goes to the chosen commit, filling in Git's index and your working tree, and choosing which branch-name-and-commit is to be the new current commit. Those files come out of that commit and fill in Git's index and your working tree, and you're ready to work again.

Until you actually run git commit, though, your files aren't in Git. Once you run git add, they're in Git's index, but that's just a temporary storage area, to be overwritten by the next git checkout or git switch. It's the git commit step that really saves them. That adds the new commit to the current branch, too.

Introducing other Git repositories

Now, besides all of the above, you're also using git fetch. You use this when there are at least two Git repositories. We mentioned earlier that we will connect two Gits—two implementations of Git software, using two repositories—to each other and have them transfer commits. One Git can tell if the other Git has some commit just by showing the hash ID: the other Git either has that commit, in its big database of all commits, or doesn't. If the Git that lacks the commit says I don't have that one, gimme, then the sending Git has to package up that commit—plus any required supporting objects—and send them over, and now the receiving Git has that commit too.

We always use unidirectional transfers here: we run git fetch to get commits from some other Git, or git push to send commits to some other Git. These two operations—fetch and push—are as close as Git gets to opposites, although there's a fundamental mismatch of sorts here (which I won't get into because this is already quite long). We'll just talk about fetch.

When we connect our Git to some other Git—let's use GitHub's Git software and repositories as our example here, though anything that speaks the right Git software protocol works—with git fetch, we:

  1. Ask the other Git to list out all its branch (and tag) names and the commit hash IDs that go with those branch names (tags make things more complicated, so we'll ignore them here).

  2. For each commit hash ID that we don't have, but are interested in—we can limit which branch names we bother with here, but the default is that all are interesting—we ask them send that commit please!. They're now obligated to offer the parent commit(s) of those commits. We check to see if we have those commits, and if not, ask for those too. This goes on until they get to commits that we do have, or completely run out of commits.

  3. This way, we'll get from them every commit they have that we don't. They then package those up, along with any required supporting internal objects, and send them all over. Now we have all their commits!

  4. But remember how we find commits, in our repository, using branch names? We have a problem now.

Suppose that we have, in our repository, these commits:

...--G--H--I   <-- main (HEAD)

That is, we just have one branch name, main. We got commits up through H from them earlier, but then we made commit I ourselves.

Meanwhile, as we were making commit I, they made commit J and put that on their main, so they have:

...--G--H
         \
          J   <-- main (HEAD)

I drew this with J down a line because when we combine our commits and theirs, we end up with:

...--G--H--I   <-- main (HEAD)
         \
          J

What name will we attach to commit J so as to be able to find it? (Remember that its true name is some big ugly random-looking hash ID.) They're using their branch named main to find it, but if we move our branch main to point to J, we'll lose our own I!

So we don't update any of our branch names. Instead, our Git will create or update a remote-tracking name for each of their branch names:

...--G--H--I   <-- main (HEAD)
         \
          J   <-- origin/main

Our remote-tracking names are shown with git branch -r, or git branch -a (which shows both our own branch names and our remote-tracking names). A remote-tracking name is just our Git's way of remembering their branch name, and our Git makes it up by sticking origin/ in front of their branch name.5

Now that we have both their commits and our commits, plus remote-tracking names that help us find their commits if they don't overlap ours exactly, now we can do something with their commits. The "something" that we do depends on what we want to accomplish, and here things actually start to get complicated—so I will stop here.


5Technically, our remote-tracking names are in a separate namespace, so that even if we do something crazy like create a (local) branch named origin/hello, Git will keep these straight. Don't do it though: you'll probably confuse yourself, even with Git's trick of coloring different names.


So what happened to your changes?

Let's look at this part again:

$ git checkout A
error: The following untracked working tree files would be overwritten by checkout:
 cc.py dd.py ....

These were files you created, that did not come out of some earlier commit. They were in your working tree, but not in Git. ("Untracked" means "not even in Git's index".)

The checkout command gave you this error to let you save the files, either in Git—by adding and committing them—or elsewhere. But you didn't mention doing that:

$ git checkout -f A

The -f, or --force, flag here means go ahead, overwrite these files. So the files you created are gone: the branch name A selected a commit that had these files, so they came out of the commit, went into Git's index, and were expanded into your working tree.

The previous working tree files were never in Git, so Git can't retrieve them. If you have some other way of retrieving them—e.g., if your editor saves backups—use that. If not, you may be out of luck.

torek
  • 448,244
  • 59
  • 642
  • 775
  • And the OSCAR goes too!! where is the part that solves my problem, I read the entire answer but cannot find the solution in it. Basically I did not commit anything. I just switched the branches from `B` to `A` and that's all about it. I cannot even commit the changes that I did in branch `A` because git says all the sudden `your branch is up the date`. How could it happen ? – Alexander Oct 27 '21 at 00:49
  • You used `git checkout -f` to *discard* all your changes. They're gone. They were in the working tree, but you told Git to overwrite them. – torek Oct 27 '21 at 01:18
  • ok that's something :) please keep it coming. so `git switch` is the new command that I should use next time and it catches where I left of with my commits in branch `A` ? – Alexander Oct 27 '21 at 01:22
  • The `git switch` command is a lot like `git checkout` here: if it says that it would destroy unsaved work, you'll probably want to save the work somewhere first. If you are willing to discard the unsaved work, the `--force` action here is the same. The key difference between `git switch` and `git checkout` is that `git checkout` has many modes of operation, while `git switch` has few. The other `git checkout` modes were copied into a separate command, `git restore`. – torek Oct 27 '21 at 01:26