Apparently C:\files\programming\workspaces\project1
is not a Git repository.
It's true that Y:\git\myrepo.git
is a Git repository (assuming the earlier git init
worked). But C:\files\programming\workspaces\project1
is not. You'll need to create a second Git repository there.1 You could git clone
the empty repository over in Y:\git\myrepo.git
, for instance (although cloning a totally empty repository has some weird side effects and is usually not the right way to start).
The way Git works in general is:
You clone an entire existing repository: every commit, which saves every version of every file, is now copied into the .git
directory here.
Git is really all about these commits. Each commit has a full and complete snapshot of every file, frozen into a special, read-only, Git-only format, along with some additional information such as who made the commit, when, and why. These commits act as archives: every time someone ran git commit
, Git archived everything.2 You now have a copy of everything, in this special achival format that only Git can use.
Each commit has its own unique hash ID. No two commits can ever share an ID. For this reason, and the fact that every Git in the world has to give every commit a unique hash ID, these IDs have to be very big and ugly and random-looking. All Gits everywhere share hash IDs for all their commits, so that they can share their commits by ID, later. You can connect any two Gits to each other and they can obtain each other's commits just by using these hash IDs.
Because these hash IDs are so big and ugly, humans can't get them right. Fortunately we don't have to; we'll see this in a bit.
Then—this is actually built in, as the last step of the git clone
you just did—you have Git select some commit. That selected commit becomes the current commit.
You usually select the commit by selecting a branch name, in which case that name becomes the current branch name as well, i.e., these two actions are paired up: selecting branch master
as the current branch chooses the last commit in that branch as the current commit.
(There's an extra complication here when doing that initial git clone
, again, but we'll skip it for now.)
Git now extracts all the files from the chosen commit, into a work area. In this work area, you have ordinary files that you can use with all the ordinary programs on your computer. These are your files, to work with as you wish: they're not Git's copies at all. Git just extracted everything from a commit, in order to make these copies available.
Once you have all the working copies of files stored in this work area, which we call a working tree or work-tree, you can work with them. That's why it's your work-tree: because you can actually get some work done.
Having worked on a bunch of files, you might want to save a new saved-for-all-time archival snapshot. You might think you could just run git commit
and Git would save all your files. Other version control systems work this way, but Git does not. Git has, in a secret file,3 saved away all the files that came out of the current commit. Those files are in the special Git-only frozen format, but unlike the copies of the files that are in commits, they're not actually frozen. You can replace them, or remove some file(s) entirely, or put new ones in.
Git calls this special extra area the index, or the staging area, or sometimes—rarely these days—the cache. These three names for this one thing reflect its central and multiple roles in Git, or perhaps there are three names because the original name, "index", is just so terrible. :-) But either way, you need to know about the index.
Essentially—and leaving out some of its other roles—what the index is and does is represent the next commit you will make. That is, it contains, in a special Git-specific format, some information about your current work-tree, but more importantly, a copy of each file that will go into the next commit.4
Having updated your work-tree files, you need to copy those files back into Git's index, which you do with git add
:
git add file1 folder/file2
for instance.5 This copies these two files from your work-tree into the index, turning the copies into the special Git-only format, ready to go into the next commit. In other words, these files are now staged for commit, hence the other name of the index, "staging area". They're not actually committed yet but they are ready to go. (They were ready to go before, too, but before, they matched the current commit's copy!)
At this point, running git commit
makes a new commit from whatever files are in the index right now. This new commit gets your name and email address as both author and committer, and "now"—the current date-and-time reported by your OS—as the time-stamps. You should supply a log message giving the reason you made the commit: a summary of why you are doing whatever you are doing.6
The git commit
command packages up all of this information—who, when, why, and so on—along with the raw hash ID of the current commit, and makes a new commit out of this plus the snapshot it makes using the files that are already in the right format in Git's index. Now that the new commit is made, it becomes the current commit. Now things are back to the way they were when you ran git checkout
: the index and the current commit both contain the same set of files, in the frozen archive format, ready to go into a new commit.
Note that no existing commit changes during all of this. In fact, no existing Git commit can ever change. All commits are frozen for all time, read-only. They continue to exist as long as you and Git can find them—usually forever, but you can arrange to "lose" one, if you've made one that you don't like.
The way Git finds commits is important, and a little tricky. Once you get the hang of it, though, it's actually really simple.
1This may actually be the only repository you need: it's not clear why you wanted an empty and bare one in Y:\git\myrepo.git
in the first place.
2More precisely, Git archived everything it was told to archive, as we'll see in a moment.
3It isn't really secret at all, but you can't see it very well: it's hidden in a specially-formatted file in .git
named index
(and maybe other places too, but they all start from the index file; the index file contains records, and some of them might list more files).
4Technically, what's in the index, in these cache entries, is the file's path-name, mode, and an internal Git blob hash ID. There's also a staging slot number which is really only used for merging. The hash ID means that rather than holding an actual copy of each file, the index just holds the record of the Git-formatted blob object. But unless you start using git ls-files --stage
and git update-index
directly, you don't really need to know about this: you can just think of the index as holding a copy of each file.
5You can use either forward slash like this, or backslash; both work. I don't use Windows, and always use forward slash, and the few times I have been forced to use Windows briefly, I always name my files there with forward slashes. (This mostly works, except for a few commands that insist on thinking they're switch options. When dealing with Git and its ecosystem, backslash tends to confuse some other programs: \b
, for instance, may represent a backspace, and \t
a tab, so an attempt to name a file .\buttons\toggle
can misfire and you end up with a file named .^Huttons^Toggle
or something.)
6Git can easily show what you did, later, but Git has no idea that this was, e.g., to fix bug#12345 or issue#97 or whatever it might be, much less how the bug or issue could be described. This log message is your opportunity to explain things like what the bug is, where to find it in the bug reporting system, what you discovered during investigation of the bug, and anything else that might be helpful to you, or someone else, looking at this commit later.
Branch names let Git find commits for you
A branch name like master
, in Git, really just holds one hash ID.
That's all it needs to do. We mentioned before that whenever you have Git make a new commit, the new commit saves the raw hash ID of the current commit.
Suppose you have an existing Git repository with just one commit in it. This one commit has some big ugly hash ID, but we'll just call it A
for short:
A
There's only the one commit in the repository. That one commit has however many files, but it's just one commit. It's easy to find: it's the commit. Let's add a second commit now, by having this commit checked out via the name master
—we'll put the name in, in just a moment.
We modify some work-tree files, git add
them, and run git commit
and give it a reason for the commit to put in the log message. Git builds a new commit out of all of the files in the index, plus the usual metadata, including the hash ID of commit A
. Let's call the new commit B
, and draw it now:
A <-B
B
contains the old commit's hash ID. We say that B
points to A
.
Git writes the new commit's hash ID into the name master
, so let's draw the name master
pointing to B
now:
A--B <-- master
I've already gotten lazy here (on purpose): it's B
that points to A
, not vice versa. But the arrow coming out of B
cannot change, because no part of any commit can change. It's the arrow coming out of master
that changes. We call commit A
the parent of B
, and B
a child of A
.
The current branch is now master
and the current commit is B
. Let's make a new commit in the usual way:
A--B--C <-- master
New commit C
points back to B
, which points back to A
. So B
may be a child of A
, but it's also the parent of C
.
(Where does A
point? The answer is: nowhere. Commit A
is a little bit special. Being the very first commit, it can't point back to any earlier commit. So it just doesn't. Creating the first commit in a repository is a bit of a special act; it's what creates the branch name, too! The name master
is not allowed to exist until some commit exists, so creating commit A
creates everything.)
(I keep saying a child, not the child. That's because we can go back and add more children later. Commits, once made, are frozen for all time, so the children know exactly who their parents are, but parents can acquire new children, someday, in the future. When a new commit is made, it never has any children yet. So parents never know who their children are. That's why Git works backwards!)
Note how all we need is for Git to hold the raw (and random-looking) hash ID of the last commit in the name master
. We can remember the name master
, and Git remembers the hash ID for us. Adding a new commit consists of:
- making sure the current branch name is
master
(git checkout master
if needed)
- so that the current commit is
C
- so that Git's index is full of the right copies of files, and our work-tree has the files we want
- so that we can change work-tree files in place using all of the normal computer tools
- so that we can
git add
the updated files to make Git copy them back into the index
- so that we can
git commit
to make a new commit D
which will change our picture to read:
A--B--C--D <-- master
All of these new commits go into our repository. The repository itself is mainly just two big databases:
- the commits, and other internal Git objects, addressed by hash IDs;
- and a smaller name-to-hash-ID table, that says things like branch name
master
means commit a123456...
or whatever.
The entire repository is in the .git
directory / folder, underneath the top level of our work-tree. The branch name(s) find the last commits, and those commits find earlier commits. Git simply walks backwards, from last commit back to first one, one commit at a time. Git knows that it has run out of commits to walk backwards through when it reaches a root commit like commit A
, that has no parent.
There is a lot more to it than this, starting with the fact that you can add more branch names:
A--B--C--D <-- master, dev
for instance, and move branch names around, and so on—and we haven't even touched on the idea of connecting this Git repository, in C:\files\programming\workspaces\project1
, to another Git repository in Y:\git\myrepo.git
or on another machine or whatever yet. That's where things get complicated. That's what git remote
is for: a remote is a name you use in your Git to remember the URL for some other Git repository.
If you don't need to use remotes yet, don't do that; this is plenty to start with.