Error: "modified content, untracked content"

Question

I found this error when I uploaded my project on github. When I looked for a solution, it said that I could delete the .git file in the problematic folder, run git rm -rf --cached [foldername] , and then add it again. But I couldn't find the .git file in the problem folder, so I couldn't delete it..., and if I just run the command to clear the cache and add it again, the same problem occurs. There is only a .gitignore file, but no .git file, how do I remove it? I think I'm taking the wrong approach, so I'm asking for help...

Show text, not pictures. Links to external sites can become stale and make your question and a potential answer worthless. — j6t, Jul 13 '22 at 05:29

score 1 · Accepted Answer · answered Jul 13 '22 at 10:46

TL;DR

The message modified content, untracked content (which I've taken from your posting title) means that you are using submodules. Read up on submodules. Then decide whether you want to use submodules, which introduce a lot of pain: people call them sob-modules for a reason.

Long

This is not an error, it's just a fact of life with submodules. When using submodules, you must understand exactly what you're doing. This means you must learn that Git is all about commits—well, you need to learn that to use Git even without submodules—and that when using submodules, the submodule stuff is about a commit in some other Git repository.

Let's start with this: A Git repository is, at its heart, a database or two (I like to split it into two). Both are simple key-value stores, i.e., they're the kind of database where you hand a key (a string of bits or bytes or something) to the database and the database gives you back some value. Very simple ones, like the ones Git uses here, have just one value for each key. The two databases here are Git's all objects database, and a separate names database.

The "objects" database holds commits and other supporting Git objects. These objects are numbered. In particular, each commit gets a unique number, different from every other commit, ever, in any Git repository anywhere in the universe. This number is big and ugly, looks random, and is unpredictable,¹ and Git absolutely requires it to find the commit in the database.

Due to the way Git computes the number, no part of any commit (or any object at all, in fact) can be changed. So once you make a commit and have Git stick it into the all-objects database, it's there forever, or rather, as long as you can find its hash ID somehow. (If you can't find the key to look it up, is it still in the database?²)
Meanwhile, the "names" database holds, well, names: branch names, tag names, remote-tracking names like origin/develop, and so on. These are the keys, and the values in this database are hash IDs.

The main point of the names database—which also has a feature that lets us iterate through any or all of the keys³—is that humans are bad at hash IDs. We're much better at names like main and develop and v1.2, which internally become refs/heads/main and refs/tags/v1.2 for instance. Git then looks up the name and finds the hash ID for us, so that even though the objects database requires hash IDs, we can use names.

¹It is, in fact, the output of a cryptographic checksum algorithm. (The idea is kind of similar to that of cryptocurrencies, but Git's current algorithm is not as good as the ones used for cryptocurrencies.) The extent to which you could predict or break it is the extent to which you could steal all the e-currency, if e-currencies used the weak crypto that Git still uses.

²It's both in and out of Schrödinger's Database.

³The all-objects database can do this too, but it's a little weird about it; in particular, you need to start with a known prefix.

Commits hold snapshots and metadata

Now that we know that a repository holds commits and other objects, which we find by names that Git translates to the hash IDs of the objects, we should look at the commits themselves. Each commit, besides having a unique number and being read-only, stores two things:

Each commit stores a full snapshot of all of its files. The files in the commit are kept in a special, read-only, Git-only, compressed and de-duplicated internal format. (Git secretly stores the files' contents as objects in that objects database, to achieve this, but you don't have to know anything abut this part—just that they're stored as snapshots. Git secretly stores the files' names as ... well, this gets complicated, and again you don't need to know all the details.)
Meanwhile, each commit also stores some metadata: information about this particular commit. This includes stuff like the name and email address of the person who made the commit, which Git gets from your user.name and user.email settings when you're the person who made the commit.

The actual metadata are really important, because Git adds, to each commit, a list of previous commit hash IDs. This list is usually just one entry long, which is all Git usually needs. This is what chains commits together to form one of the things Git calls branches.⁴

This particular answer is not about branches, though, so we won't go into further detail here. Instead, we'll focus on the commit-as-snapshot idea. Each commit holds a full copy of every file. But the files in the commit are read-only, like an archive. We have to extract them to use them! This means that the act of extracting a commit means:

First, remove all traces of the previous commit. That is, we have a bunch of files we've been looking at (and maybe using with the idea of making new commits), but now we want to be rid of those files. So remove them! (Git will first check to make sure it's "safe" to remove these files, and you'll get an error from git checkout or git switch if you ask it to do something that would lose work.⁵)
Then, extract all the files from the commit we're switching to.

Note that the files in the commit have long names with forward slashes in them, like path/to/file.ext. If your OS demands that you use path\to\file.ext instead, Git knows to convert. If your OS requires that there be a folder named path holding a folder named to holding a file named file.ext, Git knows how to deal with that, too. But the commit just has the files. Commits do not store folders—except, that is, where submodules come in.⁶ But I'm getting ahead of things here: for now let's just think of a commit as holding files. The act of checking out a commit means extract the files (after removing the previous commit's files). Because commits store de-duplicated files, Git can speed this up a lot by not bothering to remove-and-replace files that are the same in both commits, and that's also important later (though not specifically for submodules: it's just something to keep in mind). Git will make folders if and when that's required, to hold the files.

Note that I've rather cavalierly assumed here that you understand how your working tree works. This is a problem, because the working tree is closely tied to Git's index aka staging area, and the index is the source of future commits. It's also the mechanism Git uses to know which files came out of the current commit. But we'll come back to this later.

⁴The word branch is rather heavily overloaded in Git. I am not a fan of this situation, and I do my part to try to reduce the overload by using the phrase remote-tracking name for names like origin/develop, and using the word branch name when I mean the name, rather than some set of commits found by the branch name. But human communication will use the word branch to mean many different things; just remember that Git's terminology is confusing, and if you're lost for a bit with it, you're in the company of many others.

⁵This is full of all kinds of special cases and corner cases, and there are ways that Git can lose something here. They used to be a lot more common, in the bad old days; I lost a lot of stuff due to bugs in early versions of git pull, for instance. It's one of several reasons I still mostly avoid git pull.

⁶There's no fundamental reason Git couldn't store folders in its index / staging-area. It just doesn't, and submodules are one of the reasons it doesn't: if you manage to load a mode 040000 tree object into an index cache-entry slot, it magically changes into a mode 160000 gitlink object that implements most of a submodule. This leads to the empty submodule trick. It shouldn't be required; Git's index should just be allowed to store directories.

One repository cannot hold another Git repository

For security reasons, Git administratively forbids that any one repository hold any other repository. The implementation details here show through: a repository is normally implemented by storing a hidden .git folder in the top level of your working tree. This .git folder holds the two database and all the other ancillary files that Git needs. That is, the repository is actually inside the working tree. The repository doesn't hold the working tree, the working tree holds the repository!⁷

What this means is that if any folder or sub-folder within your working tree has a .git, that folder is itself the working tree of some other Git repository. Git will refuse to store any files from that other repository in this repository. Crossing repository boundaries is supposed to put you into the "other repository".

That is, we now have two repositories. Let's call one soup for superproject, which is the term Git uses for the "above" repository, and one sub to remember that it's the submodule. The submodule has a path, such as path/to/submod, that lives inside soup's work-tree.

The design here is that Git assumes that README.md is part of soup, and path/file.ext and path/to/file.ext also belong to soup. But all files in the path/to/submod/ folder—which contains a hidden .git—"belong to" the submodule. So path/to/submod/README.md is the top-level README.md file for the repository sub.

Now, it's easy enough to create files path/to/submod/whateverin soup, some time before you create the submodule, but if you do, that's kind of problematic. Here's where the difference between your working tree and your index come into play.

⁷The .git here can now be a file containing the path to the repository. This might have been a better system from the start since removing the working tree will remove the repository. Submodules used to have this very issue; Git now works around the problem with what it calls absorbed submodules.

Git's index

Whenever you go to make a new commit, Git does not build the new commit from the files in your working tree. Instead, Git uses a thing that it calls by three different names: the index, the staging area, or—rarely these days—the cache.

What the index holds are Git-ified, pre-de-duplicated copies of files. Being pre-de-duplicated, and having initially come out of the current commit and therefore necessarily duplicates, they don't take any space: they're non-space-using de-duplicated things.⁸ But they do hold the file paths, with those forward slashes that Git uses. These names are the names of the files.

But what if the index for the working tree for soup says path/to/submod/README.md? Then the superproject repository soup contains a file that must live in the submodule. Meanwhile the submodule, once you create it, will probably hold that file—and you won't be able to put that file into the superproject any more because it's now in the submodule.

What actually happens, if you do this today—it's not that hard to accomplish, perhaps even by accident—is, well, whatever Git "feels like doing". There is no defined way that it's supposed to work: it's not supposed to happen in the first place. So don't do it, because whatever Git does today, even if that's something you like, Git might not do it in the next Git version.

Instead, make sure that before you add a submodule to your repository—thus turning your repository into a superproject like our soup here—you don't have any files in the location that will become the submodule. (This is probably not a problem in your particular setup. I'm just mentioning it here in case someone else has a similar problem later.)

When you're ready to use a submodule, you should use git submodule add to add that submodule. This command creates a file named .gitmodules, if it does not yet exist. We'll see the purpose of this file in a moment. Note that git submodule add takes two parameters:

the URL at which you plan to store the submodule, and
the path in the superproject.

In your case, you'd like the path in the superproject to be client. (I have no idea what URL you want to use.) See the git submodule command's documentation for further details here.

Having created the submodule .gitmodules control file, or updated it if it already existed, git submodule add then creates what Git calls a gitlink if appropriate. We'll talk about that next too.

⁸Technically, the index holds the file's name, mode, and a bunch of other Git-specific information plus an indirect reference to the pre-de-duplicated file data, rather than the actual file data. It's this layer of indirection that provides the de-duplication. The file name and cache data and so on, plus this indirect reference, take roughly 100 bytes or so per file, so if your repository has 10,000 files, the index will be about a megabyte: practically nothing, compared to the 10,000 files. Good thing they're all de-duplicated!

How you clone a repository to your laptop

Suppose there's some interesting project, on GitHub for instance, such as https://github.com/git/git/, and you decide you want to clone this to your laptop. You run:

git clone https://github.com/git/git/

That is, you give Git a URL. Git reaches out to whatever software "answers the Internet-phone" at that URL, and if all goes well, you now have a clone named git in your current directory. Or, you run:

git clone https://github.com/git/git/ my/new/git/clone

and the same cloning process happens but the new repository is named my/new/git/clone.

So, git clone needs two things:

what to clone (the URL), and
where to put the new clone.

These are the two things you told git submodule add, that this command recorded in the .gitmodules file. In other words, these are the instructions that Git can use to clone the submodule: it's really that simple. You clone the superproject, and then Git can clone the submodule for you.

For Git to clone the submodule for you, Git needs these two chunks of information. If you don't use git submodule add, at least one of them will be missing, and Git won't be able to clone the submodule for you. You get a "broken" or "half-assed" submodule, as I like to call it. Technically, you get just the gitlink part of things.

Now that you have two clones—your superproject and your submodule—you have one other problem. The superproject does not contain any files from the submodule. It has only that .gitmodules file that says how to run git clone.

So, what Git does is this: every commit you make, from now on, in the superproject, will have a special index entry that Git calls a gitlink. A gitlink is just a path name like path/to/submod plus a commit hash ID. By some sort of conveniently magic coincidence,⁹ this stuff is exactly the stuff that goes into an index entry for a file. So a gitlink is just a special "file", sort of, that goes into each commit.

Each commit in the superproject contains a path name and hash ID for each gitlink. So when Git is checking out a commit, with git checkout or git switch, Git will fill in the superproject's index entry for that path with the commit hash ID to use in the subproject.

So far, this should all make sense. But now let's do some work.

⁹Or perhaps, by Linus Torvalds' design.

Working in the submodule

Let's say that our submodule, path/to/submod or maybe just client, has been cloned or git init-ed so that we have the repository here. We:

cd path/to/submod

or:

cd clone

to get into this Git repository. We are no longer doing any work in the superproject. We are doing work in the submodule repository now.

We do our work and run git status and it says:

HEAD detached at a123456...

Uh oh. What the ...??!

What's going on here is that, earlier, the superproject Git read, out of its index, that this particular submodule should use commit a123456.... So the superproject Git software ran (cd client && git checkout a123456...) to check out that particular commit, as a detached HEAD.

What we need to do is add and commit any updates we made in the submodule. But for us to commit and push, we're probably going to want to be "on" a branch. So before we started our work we should have run git switch (or git checkout) to switch to some branch.

We need to do that now, even though we have uncommitted work. See other StackOverflow questions and answers about this. Remember, we're not doing anything fancy here, we are just working in an ordinary repository. The fact that it's some other repository's submodule doesn't matter! We just want to be "on" some branch and make some new commit. We can then git push this new commit to GitHub or wherever it is that we like to publish our commits.

Once we've made the correct commit in the submodule, it's time to return to the superproject:

cd ..

(or whatever as needed). Then we need to tell our superproject-repository Git software to update its index for the submodule, so that the next commit we make here, in the superproject, contains the correct gitlink hash ID:

git add client

for instance. This does not add any files from within client/. Instead, it notices that client is a Git repository and that client has some commit (let's say fedcba9...) checked out. So it updates the index (for the superproject) to say use commit fedcba9.

Before and/or after we've git add-ed the submodule, we can git add other files if appropriate. Then we can git commit as usual in the superproject.

Summary

To work in a submodule:

First enter the submodule.
Then put it on a branch with git switch or git checkout, or create a branch, or whatever may be appropriate here, so that you will be able to git push later.
Then, do your work. Add and commit as usual, before and/or after testing as usual. (If you did switch branches, consider testing before you do any work to make sure that the branch tip commit you got still works with your superproject.)
Once you're convinced the submodule is truly ready, you can git push the commit. Note that git push wants to use branch names; that's why we did our work "on" a branch.¹⁰ You can defer this git push for a bit; see below.
Then, return to the superproject. You can now git add the submodule. Remember that modified content may mean that the submodule is on a different commit than what the superproject's index says, and this is perfectly normal. The superproject Git has simply done a cd submodule && git status to get this sort of summary.
When everything is ready in the superproject, and you've already git push-ed in the submodule, you can now git push in the superproject. If you git push in the wrong order, you'll be sending a gitlink that refers to a commit that nobody else can git clone or git fetch yet. So be sure to push the submodule first.

Git's submodule facilities are currently improving, but they're not all the way there yet, and lots of old Git versions are still out there today. If you have the latest Git, some of the above steps can be reduced to simpler single commands, especially using the recurse modes, but it's still somewhat messy and painful.

¹⁰You can do your work without using a branch name, and only assign a branch name during the git push step. But if you know how to do that, you may not be reading this answer in the first place.

OMG... I was very surprised by the length of the answer you wrote down. I will read them all slowly and try to execute them one by one. Thank you so much for your advice!! Have a good day!!!! — hyehye, Jul 14 '22 at 04:15