TL;DR
Your first instinct was almost certainly correct: you probably want git rm --cached -f path
, where path
names the path to the sub-repository. This will remove the thing that Git calls a gitlink from the index / staging-area / cache.
Long
First, remember that Git does not store directories at all. So this directory is not in Git in the first place. The reason why has to do with what Git calls, variously, the index, the staging area, or—relatively rarely now but still visible in git rm --cached
—the cache.
Second, know that Git never stores another Git repository in a repository. Or, to put it another way, repositories never actually nest. The actual implementation here is to forbid any path name components that consist of .git
(including case-insensitive variants such as .GIT
or .Git
or whatever).1
What you have here is what Git calls a submodule (or perhaps half a submodule: the half that Git calls, internally, a gitlink).
1In very old versions of Git, the authors forgot to account for case-insensitive file systems on Windows and MacOS, and allowed creating repositories with files named, e.g., foo/.GIT/HEAD
and the like. This made the "outer" Git treat the foo/.GIT
directory as another Git repository. This made it far too easy to set up Trojan horse repositories as traps for those using these systems.
Commits
Git is ultimately built out of two key-value databases, one of which is copied by cloning. (The other, which holds branch and tag and other such names, is partly copied but modified during cloning.) The main database consists of commits and other internal Git objects. Each of these objects is read-only, because the way Git finds these objects is by its key, and its key is itself a cryptographic checksum of the object. If you take an object out of the database, manipulate some of its bits, and then try putting it back, what you get is not a modified object, but rather a new object, with a new and different key.2
The most interesting object for our discussion here is the commit. A commit contains a snapshot of all the files that Git knows about.
2This makes the assumption no key will ever repeat unless the value itself is a duplicate. (This duplicate value = same key trick is how Git de-duplicates file content.) Git currently uses SHA-1, which is good enough in practical terms, but is susceptible to deliberate attacks. The consequences of such an attack are mostly just nuisances, fortunately. For more about this, see How does the newly found SHA-1 collision affect Git?
The index
Git builds new commits by first storing, in something it calls the index,3 a series of records giving path names and hash IDs for Git objects—mostly blob objects that will store those files' contents. There is no record-type that will hold a directory, and this is why Git cannot store directories.
The git commit
command simply packages up the index's records4 and wraps the package with a commit object, so as to make the new commit. So the index's function is to be the staging area: it contains the proposed next commit. Since the index is not itself a Git object, it can be modified in place as needed.
For concreteness, the actual records—ignoring headers and extensions and just concentrating on the index's normal everyday file entries—consist of:
- a mode based on Unix-style inode mode fields;
- a path name;
- a hash ID giving an internal Git object ID; and
- other cache data I'll ignore here.
The mode
is 100644
or 100755
for ordinary files—you will see these often in git diff
output—with other mode values reserved for symbolic links and gitlinks. The path name contains any slashes needed: files here can have long names such as path/to/file.txt
. That's not a directory path
that contains a sub-directory to
that contains a file named file.txt
: it's literally a file whose name is path/to/file.txt
.
Note that checking out some existing commit first fills in Git's index with these records as stored in that commit, then populates your working tree with actual files if / as needed.
3This is currently a single file usually named .git/index
, but it can itself refer to additional files. This is a bit problematic because these additional files can't be properly protected during Git operations. Very large index files (e.g., millions of records) result in performance problems, hence the notion of a "split index", which this answer doesn't cover at all.
4Git turns the names into one or more internal tree objects that generally refer to more tree objects, with each slash-separated name component grouped into some sub-tree. If the index could store directory names, these tree objects would allow Git to store an empty directory—but it can't, so Git can't.
A submodule is a reference to another Git repository
This finally gets us to submodules. We know that:
- a repository is a collection of commits, and
- commits are identified by hash IDs.
What if we could have Git clone some other repository for us, automatically, while we are working, and then git checkout
the correct commit in that other repository? This is what submodules are all about.
In order to clone a Git repository, Git needs:
- a URL, and
- a place to deposit the cloned repository: a path relative to this repository.
To get the "outer" or superproject Git to git clone
some inner Git, we need to store this information. This stuff goes into a plain-text file, formatted like a Git configuration file, called .gitmodules
.
Once the clone is made, though, we need to have the superproject Git enter the submodule and run git checkout hash
or git switch --detach hash
. This requires two things:
- a path relative to this repository, and
- a commit hash ID.
The superproject Git gets these from Git's index, which as we already saw, stores both a path name and a Git hash ID. When a commit contains a gitlink—an entity with mode 160000
—the checkout operation just reads this gitlink into the index. So now Git has, in the index, a path/to/gitlink
or whatever name, along with a stored commit hash ID.
This means the index stores gitlinks
Whenever you are:
- in your superproject working tree (and not down within the submodule working tree), and
- you run
git add
on a path that is a path to the submodule,
your superproject Git will add to its index, or update in its index, the appropriate gitlink entry. Note that Git does not check, at this time, whether there is an appropriate .gitmodules
entry. It just updates or adds the gitlink in the superproject Git's index.
The superproject Git finds the hash ID that goes with this gitlink by cd
-ing into the submodule and running git rev-parse HEAD
.5 So that updates the gitlink entry in the index, based on whatever commit is actually checked out in the submodule.
If the .gitmodules
file is missing or incomplete, this particular submodule is, well, kind of half-assed: any other clone you make of this repository won't have any idea what URL to use to run git clone
to obtain the submodule. Since you mentioned that this is all entirely local, that probably does not matter for your use case.
5Current versions of Git literally do this, and it's not the most efficient process. New versions of Git in the pipeline have facilities to avoid starting new sub-commands, yet achieve the same result.
Conclusion
If you don't want a submodule—or a half-assed one that consists only of the saved gitlink, without the necessary stuff to git clone
the submodule in the first place—you should remove the gitlink from the index. Using:
git rm --cached -f path/to/gitlink
will do that. Make sure you use the --cached
option! (Fortunately, if you forget, it should just error out, I believe.)
If this was a proper submodule, you may want to do even more: see What is the current way to remove a git submodule? If it was never properly added, though, there's nothing more to do.