As Gabriel said, you create a new branch from an existing commit, so it's going to start with the contents of the existing commit. You're trying git checkout --orphan
to avoid starting from an existing commit, and this is a valid and sometimes sensible thing to do, but it's probably not what you want. It's failing because even then, you're starting with an existing index and work-tree.
(We can address this:
DEMOPROJECT folder is also being made a submodule for some reason which I don't want to happen.
later, although a separate question is more appropriate. And, if you already know the next few sections, you can jump down to the one about the index.)
What matters to Git are commits
If you're going to use Git, you need to get used to Git's little quirks. One of these is that branches—or more precisely, branch names—aren't all that important to Git. What matters are commits. It's crucial to learn exactly what a commit does and represents first, and then introduce branch names.
Each commit holds a snapshot of some set of files. That part's pretty straightforward. The snapshot is what shows up in your git ls-tree
output—though technically git ls-tree
has summarized entire sub-trees' files here; you'd need git ls-tree -r
to see all the files in the snapshot.
A commit also holds some metadata: the name and email address of the person who created the commit, for instance, and a date-and-time stamp. (In fact, there are two of these.) It holds a log message, which you supply when you make the commit: you should put in something that tells future-you why you made this commit. And—very importantly—each commit holds some number of parent commit hash IDs (usually just one).
Each commit, once made, has a unique hash ID. Every Git in the universe1 also agrees that that commit gets that hash ID, and no other commit ever gets that hash ID. Once the hash ID is assigned, nothing in the commit can ever change. (The reason for this is that this hash ID is actually a cryptographic checksum of the commit object's contents. That's how every Git in the universe is able to agree that this commit gets this hash ID.)
You've seen these hash IDs: they are big ugly strings like 745f6812895b31c02b29bdfe4ae8e5498f776c26
. They're not useful to humans, but they are useful to Git. Git uses the hash ID to find the actual commit, from which it finds the files, and so on.
The stored parent hash ID in a commit lets Git find the commit that comes before this particular commit. That is, given a series of commits with random-looking hash IDs, we might draw them with single uppercase letters standing in for the real hash IDs, and we could draw them like this:
... <-F <-G <-H
Here H
is the hash ID of the last commit. It stores the hash ID of an earlier commit G
, so Git can load up commit G
. If Git does load it up, G
stores the hash ID of an earlier commit F
, and so on. We say that H
points to G
, G
points to F
, and so on.
1Actually, this is limited to every Git that exchanges this commit with another Git. If two Git repositories never meet, they can use the same hash ID for different commits. The chance that they will is tiny: one in 2160 at the moment. But if two Git repositories do meet and exchange commits—as they will if you git push
your commits to, say, GitHub—they won't re-use hash IDs: each one will be unique to each commit.
A branch name points to one commit
This is where branch names come in. A branch name like master
simply holds one hash ID. We make sure that it holds the latest hash ID, so that master
points to H
:
... <-F <-G <-H <--master
Unlike the commits themselves, the names can be changed. To make a new commit, we have Git write out the commit's contents—the snapshot of all files, the log message, your name from git config --get user.name
, and so on, setting the new commit's parent hash to H
. The result is some new unique hash ID that we'll just call I
:
...--F--G--H <-- master
\
I
Now that I
exists, Git simply writes its hash ID into the name master
:
...--F--G--H
\
I <-- master
and now I
is the last commit in the branch. (And now we can just draw this as a straight line, with I
pointing back to H
. Note that we didn't change H
at all: we just added I
, pointing back to existing H
.)
This is how branches grow: we have Git add a new commit. Git writes out a snapshot, plus the metadata. The parent of the new commit is the current commit—the one we have checked out. The new commit gets a new, unique hash ID. As the last step of the commit operation, Git writes the new hash ID into some branch name—the branch we have checked out.
Note that when we make the very first commit, we have no existing commit. So the parent of the first commit just doesn't exist. This new commit has no parents; Git calls this a root commit. That gets us the first commit in the repository, so that the name master
can exist too:
A <-- master
After that, the next commit B
will have A
as its parent, and Git will write B
into the name master
.
This is also how git checkout --orphan
works. What it does is set things up so that the next commit you make will use whatever contents it will use as usual, but have no parent: it will be a root commit. So if you have:
A--B--C--D--E--F--G--H <-- master
and do:
git checkout --orphan newbranch
git commit
you get:
A--B--C--D--E--F--G--H <-- master
I <-- newbranch (HEAD)
(Whenever we have more than one branch, it's useful to attach the special name HEAD
to exactly one branch, so that we can see which one is the current branch.)
The thing is, this new commit I
stores the same snapshot as existing commit H
, unless you alter the snapshot-to-be-stored.
The index is the source of each new snapshot
Git of course has commits, which are frozen forever. The files inside a commit are stored in a special, compressed, read-only, Git-only format. I like to refer to these files as "freeze-dried". That's great for archival, but completely useless for getting any new work done.
Git therefore has the ability to extract the files from a commit into a work area. This work area, called your working tree or work-tree or some variant on these names, has ordinary files and folders/directories, in your computer's ordinary format. This allows you to work with them. It's where you do your work.
You can have files in the work-tree that you don't want to put into your next commit. But where is your next commit? Some version control systems say that your work-tree is your next commit. This is pretty straightforward and easy to use, so it's not what Git does. :-)
Instead, in Git, your next commit is stored in something that Git calls, variously, the index, or the staging area, or (rarely these days) the cache. These three names all cover one thing. What's in the index is all of the files that will go into the next commit. These files are in that freeze-dried form, ready to be committed, but unlike committed files, you can replace them with new and different files, or even remove them entirely.
You can't see the index directly. Well, you can, but it's kind of awkward. If you only have a few files, run git ls-files --stage
to see what's in the index. If you have a lot of files, this will produce a lot of output! Here's a snippet from the Git repository for Git itself:
100644 6e69877f25791632d98bf7b109a2eaebd04c96af 0 ws.c
100644 9f6c65a5809754717f8c51f809eae78f435bcd12 0 wt-status.c
100644 77dad5b92048851c622a35d8b34d802fbd0ecac6 0 wt-status.h
100644 8509f9ea223a1282a367874c3e3a3ef0c351a30f 0 xdiff-interface.c
100644 ede4246bbd3397086f90217539a2d07a35a4b986 0 xdiff-interface.h
100644 032e3a9f41a2f79eaab78ae36666b8b6218b3899 0 xdiff/xdiff.h
100644 1f1f4a3c7808435f73b0ffd1c35d5b0572516b6c 0 xdiff/xdiffi.c
Note that there are no directories/folders in this output. That's because Git doesn't store directories. It just stores files. Git simply makes a directory / folder if needed to hold a file. For instance, xdiff/xdiff.h
is the file to Git, but to your computer, the file is named xdiff.h
inside a folder named xdiff
. So Git will make xdiff
first, if it has to, to write Git's file xdiff/xdiff.h
to your computer's xdiff.h
-within-xdiff
.
A tracked file is a file that is in the index
Suppose you have, in your work-tree, a file named main.pyc
. If this file is also in the index, then main.pyc
is tracked. If this file is not in the index, the file is untracked.
When you git checkout
some commit, Git copies all the files from the commit into the index, and then into your work-tree. When you git checkout
some other commit, Git switches all the index and work-tree files as appropriate.
You can, at any time, run git add
on a work-tree file. That copies the work-tree file to the index. If it was there before, it's been replaced. If it wasn't there before, it's been added.
You can also, at any time, run git rm
on any file name. That removes the index copy and the work-tree copy.
And, of course, you can run git commit
. Whenever you do, Git packages up whatever is in the index right now and uses that to make the new commit. The new commit's parent is the current commit—the one you checked out when you started. The new commit becomes the last commit in the current branch (the one you checked out).
The .gitignore
file can only ignore untracked files
If a file is tracked—which, remember, means it's in the index right now—listing that file in .gitignore
has no effect.
Hence, if you want not to commit a file, but it's already in the index, you have to remove it from the index:
git rm --cached file
The --cached
option tells Git that even though it's removing the file from the index, it should leave the work-tree copy alone. The next commit thus won't have the file as it has, right after the removal, become untracked.
There's a catch, though: if this commit connects back to some earlier commit, and you ever check out the earlier commit, that will write the file into both the index and your work-tree. Then, when you switch back to this new commit you make that doesn't have the file, Git will remove the file from the index and from your work-tree.
Note that if you're using --orphan
to make a new branch whose initial commit has no previous commit, that can help. However, the old branch still exists, and still has commits that have the file.
Conclusion (before we leap into submodules)
You might want --orphan
, but if you really do want it, perhaps you just want to start with a whole new Git repository.
In any case, if you want files to be untracked in future commits, you must remove them from the index if they're in the index now. You can use git rm
, which removes them from the index and from your work-tree; or you can use git rm --cached
, which removes them from the index but not from your work-tree.
Once the files are untracked, you can list them in .gitignore
to make sure they don't get added to the index in the future, even though they're in your work-tree. This also stops git status
from whining about the untracked files. Just beware that checking out any older commit that does have the file, will put that into the index, for the duration of that checkout. Switching back to a newer commit that doesn't have the file will then remove the file from the index and from your work-tree. Be sure that this is OK, or else be very careful when looking at old commits.
Submodules
I'm only going to touch on this lightly, because submodules can get pretty complicated.
The stuff that's not the work-tree, in a Git repository, is normally all hidden away inside a directory/folder named .git
at the top level of your work-tree.
If some sub-directory / sub-folder of your work-tree has a .git
directory, your Git will assume that some other Git is controlling that part of your tree. Your Git will then decide that it shouldn't source-control any of that. Instead, your Git will add a reference to a commit in that other Git's repository. This reference takes the form of a gitlink in the index. If you use git ls-files --stage
, you can see this as a "file" with mode 160000
(gitlink).
To make Git not do that, make sure none of your work-tree is its own repository: that no sub-folder has a .git
folder inside it. When using submodules, Git will migrate the .git
folder into the superproject's .git
folder, replacing it with a file named .git
whose contents are the path to the superproject's .git
, so if a sub-folder has a .git
file, this too can trigger the submodule code.