-3

Suppose, In my repository, I have two branches i.e Master and development. I've created the development branch from master and now I added some new files to the development branch and after committing those changes on local, I switched to the Master branch. Now, here we don't see the new files which we had added on the development branch but when I switch back to the development branch, I'll able to see the new files. How git manage to store the files branch wise?

Shubham
  • 1,163
  • 2
  • 12
  • 36
  • 1
    There's plenty of literature on how git works. The [tag info page](https://stackoverflow.com/tags/git/info) contains many useful links. – kowsky Jan 02 '20 at 09:47
  • Very shortly (but you should indeed check for basic literature), files are not stored in branches, branches are mere pointers to a commit in the commits tree. Files referenced in a commit are hashed as part of the files tree they belong to. The branch references the commit, the commit references the tree, the tree references the file. – Romain Valeri Jan 02 '20 at 10:01

2 Answers2

1

When you run git checkout xxx to switch branches, files from that branch are extracted from the "object database", which is the archive kept under .git/objects/ that contains compressed originals of every file in every commit. (That's also where commits themselves are stored.)

mohammed wazeem
  • 1,310
  • 1
  • 10
  • 26
1

Git doesn't store branches. Git doesn't store files, either. More precisely, Git doesn't use branches to store files. Obviously Git has branches (and then, in a sense, files)—but that's not how things get stored.

What Git stores are commits. Commits themselves store files—or rather, snapshots of files, plus metadata about the snapshots—so in that sense, we can say that Git stores files. But the key thing here is that you don't get to pick and choose one file at a time, in terms of what's in Git: what's in Git goes by whole commits.

Modern commit-based version control systems (VCS) all pretty much agree on this much: a commit stores a snapshot, and you extract this commit and get this snapshot, or that other commit and get that other snapshot. What makes Git weird, though—as compared to most other commit-based VCS—is that commits are independent of branches (with one very important caveat).

That is, a commit can exist regardless of how or where it was made or imported into the repository. Each commit has its own unique hash ID—a big ugly string of letters and digits—that is allocated, forever, to that particular commit and no other commit. You either have that commit, or you don't, because you either have that hash ID, or you don't.

The thing about these hash IDs is that they're impossible for humans to remember. But we don't need to remember them: that's what we have a computer for! So we have Git remember some hash IDs for us, and this is where a branch name like master or develop comes in.

A branch name holds one (1) commit hash ID. This ID is the commit that is the tip of the branch, which means it's the last commit of that branch.

I mentioned the metadata above—that each commit stores information about itself. One item in the metadata, or more than one in some cases, gives the raw hash ID of the commit's parent. The parent of a commit is the commit that comes before it. So if we know the hash ID of the last commit in some branch, we can read that commit out, and find inside it, the hash ID of the previous commit.

Suppose we use uppercase letters to stand in for hash IDs. Imagine we have a tiny repository with just three commits in it:

A <-B <-C

where C is the last commit. Commit C stores, inside itself, the hash ID of earlier commit B. Commit B stores inside itself the hash ID of commit A. We say that commit C points to B, and B points to A.

Commit A was the very first commit, so it can't point back to any earlier commit—so it just doesn't. That lets Git stop, whenever Git needs to work backwards—and Git almost always works backwards—from the last commit to the first.

To find the hash ID of commit C, we put it in a branch name like master, so that master points to C:

A--B--C   <-- master

If we create a second branch name, we'll start it also pointing to commit C:

A--B--C   <-- master, develop

Now we need a way to know which branch name we're using. (We're using commit C either way.) So we attach the special name HEAD to one of these two branch names:

A--B--C   <-- master (HEAD), develop

or:

A--B--C   <-- master, develop (HEAD)

Running git checkout master switches this attachment, and gets us commit C. Running git checkout develop switches HEAD to develop, and also gets us commit C.

If we now make some set of changes and commit them, we get a new commit. Let's call this new commit D. Commit D will point back to C, like this:

A--B--C
       \
        D

and now, because we just made D and it's now the last commit on our current branch, Git changes the branch name—the one HEAD is attached-to—so that it points to D:

A--B--C   <-- master
       \
        D   <-- develop (HEAD)

If we now git checkout master, Git takes commit D's content out of where we're working, and puts commit C's content in place instead:

A--B--C   <-- master (HEAD)
       \
        D   <-- develop

Now we're back on master and hence commit C. If we make another new commit, we get this picture:

        E   <-- master (HEAD)
       /
A--B--C
       \
        D   <-- develop

with commit E pointing back to commit C.

Which branch(es) are each commit on? In Git, the answer is that A-B-C are now on both branches, while E is only on master, and D is only on develop.

So now you know what a commit really does. The precise manner in which it does it—how it stores a full snapshot—is not particularly important. The important thing is that it does this ... well, that, and knowing that:

  • all commits are read-only: nothing and no one can change any commit; but
  • we find commits by using a branch name, and anyone can change any of their own branch names as much as they want.

So if you don't know the raw hash ID of a commit, and use someone's branch name to find their latest commit, you may find a different commit every time you do this, depending on how often / fast they change their name-to-hash-ID maps.

And, because we use commits to find commits—after finding the last one from a name, like a branch name—that means we can find every commit, as long as it's reachable. The idea of reachability is tricky, but very important. A Git commit's continued existence depends on its reachability. So, while commits are independent of branch names, they must be reachable to remain in your repository. There are non-branch-name ways of finding commits:

  • tag names can find commits;
  • git stash makes two—or sometimes three—commits, but uses a special name, refs/stash, to find them, rather than a branch name;
  • and there are other ways you will use every day to find commits, such as using remote-tracking names like origin/master.

As mohammed wazeem said, these commits are stored as objects. Internally, there are four types of objects, but you will mostly deal directly with commits. Objects can be either loose or packed, but these details are usually entirely invisible. But remember these key points:

  • Git stores commits. (The commits then store the files.) You always get whole commits. You have the commit, or you don't; git push sends whole commits, not files.

  • Commits have hash IDs. They're big and ugly and you pretty much have to cut-and-paste them to use them, but they are the one guaranteed name for a commit: every Git will use the same hash ID, if you're working with that particular commit.

  • Branch names let you find commits—each branch name finds the last one—and add new commits, which works by having each new commit point back to the previous branch-tip commit, then updating the branch name.

But anyone can move any branch name arbitrarily if they want. Most of the time, people move their branch names such that new commits merely add on to the existing set. Checking out the branch name means get me that commit and at the same time, attach the special name HEAD to that branch name, so that you can make new commits and automatically move the branch name correctly.

torek
  • 448,244
  • 59
  • 642
  • 775