Changing Git Submodule Commit

Question

Upon attempting to 'build' from a repository using Apache Ant i get an error similar to this

Fetched in submodule path 'externals/ringojs-fork', but it did not contain 
298e62daa64923b7bc1e4a085233529f907ba7bf. Direct fetching of that commit 
failed.

From what I have read I think I need to change the commit at which one of my submodules is pulling.

As seen in the picture my ringojs-fork external submodule is pulling from a commit starting 298e. How do i change this to be the master?

I am on Linux Ubuntu.

What do you mean by "pulling at/from a commit"? On principle, each branch can have an upstream set to a specific remote *branch*, but not to a specific *commit*. Also, you seem to use a UI. If you tell which it might help some people help you. — Romain Valeri, Aug 17 '18 at 12:28
The submodule, from what i can tell, is downloading not from the master brand but from a different one. i would like for it to be downloading from the master. — Jamie, Aug 17 '18 at 12:33
Are you interacting with git through your UI, or command line ? — Romain Valeri, Aug 17 '18 at 13:01
So `git branch -u / ` should be fine. Example if for ringojs if your remote target branch is named notMaster : `git branch -u upstream/notMaster ringojs`. Here `` is a placeholder for the name of your remote (usually `origin` but it's only a convention) — Romain Valeri, Aug 17 '18 at 13:20
I was not too sure at start, but it could be a duplicate of [this one](https://stackoverflow.com/questions/520650/make-an-existing-git-branch-track-a-remote-branch/2286030). — Romain Valeri, Aug 17 '18 at 13:23
Apologies if this is obvious. What should I replace notMaster with? It's the ringojs-fork submodule. — Jamie, Aug 17 '18 at 13:36

score 2 · Accepted Answer · answered Aug 17 '18 at 16:11

... From what I have read I think I need to change the commit at which one of my submodules is pulling.

Maybe. It may also be the case that you just need to poke someone to update the upstream submodule. Or, maybe you should just use a different commit in the superproject. Or maybe your submodule is cloned as a shallow repository, but should not be. (I think this last is the most likely, based on the fact that https://github.com/ringo/ringojs/tree/298e62daa64923b7bc1e4a085233529f907ba7bf exists.)

If the problem is in fact a shallow clone, navigating to the submodule repository and running git fetch --unshallow should fix it. You could use git submodule foreach git fetch --unshallow to do that.

Background

You probably already know what a Git repository is: it's a collection (or database) of commits, with each commit representing a complete, intact snapshot of an entire source tree. Some particular commits are especially important, either right now, or always, so these commits have names: a branch name, like master, names the latest commit on that branch. It's important because it's new! or shiny! or whatever. Meanwhile, a tag name, like v1.2, names a commit some person thought was important, such as a stable release.

Each of these names—branch or tag, or really, any other human-readable name you can use in a Git repository like origin/master or whatever—is actually just a name for a raw hash ID. These hash IDs, which include things like 298e62daa64923b7bc1e4a085233529f907ba7bf, are apparently-random, big ugly hexadecimal numbers that are useless to humans. You have just seen an example of how such a number is not useful to you. But they are what Git uses to check out specific commits. When you use a name like master or v1.2, Git translates that name into the correct hash ID, and checks out that commit.

Because a Git repository is, at least normally—this becomes important soon, fully self contained, Git can make sure that all the names are valid and identify valid commits. When you tell Git: check out master, it's never the case that the name master exists and names commit a123456... and yet that commit doesn't exist. Either the commit does exist, and master can name it, or the commit doesn't exist, and master cannot name it.¹

When you use a branch name to check out one specific commit—by running git checkout master or git checkout develop, for instance—Git turns the name into the hash ID, locates that commit in the database, and extracts that commit into a work-tree where you can use it and/or work on it. The commits inside the database are in a form usable only by Git itself, so without a work-tree, you could not do any work. At the same time that Git extracts the commit, Git remembers the name for you, so that you are now "on a branch".

You can, however, also select any historical commit you like, by its raw hash ID, and run git checkout a123456.... That commit must exist of course, but assuming it does, Git extracts that commit into your work-tree, and remembers that you're not on any branch now. Instead, Git says that you have a "detached HEAD".

In general, you get a Git repository by cloning:

git clone <url> <directory>

clones the repository found at the given URL, and puts it in a particular directory. The last step of git clone is to git checkout a branch or tag—often the branch name master, but you can add arguments to git clone to say what to check out.

¹This gives rise to the interesting case of a new, completely-empty Git repository. In such a Git repository, there are no commits yet. This means there are no branches! The branch name master does not exist yet, in a new, completely-empty repository. The very first commit you create becomes the latest commit on branch master, and in so doing, also creates the name master. This is because the name must always contain a valid commit hash ID.

Submodules

With all that in mind, let's take a moment to describe what a submodule is. The short version is that a submodule is just another Git repository. The only thing special about the submodule is that it exists because some other Git repository—which Git calls the superproject—says: "while I am my own Git repository, I would like to use another Git repository too." The superproject is required to supply the information you would have passed to git clone: the URL, and the path.

If you start working in the submodule, though, you find that it's almost always in the special "detached HEAD" state. That's because instead of checking out a branch or a tag, the superproject also tells the submodule Git: and by the way, after you've cloned or fetched everything, I want one specific commit and here is the hash ID: _______. The superproject supplies the raw hash ID—not a name like master or v1.2, just a raw hash ID.

This is the source of the error you are seeing. The superproject repository you chose, some version of Ant, lists a submodule repository by name (apparently this ringojs-fork thing). You can clone that other Git repository just fine. But then, your superproject tells your Git system: after getting the latest from ringojs-fork, check out commit 298e62daa64923b7bc1e4a085233529f907ba7bf. But commit 298e62daa64923b7bc1e4a085233529f907ba7bf does not exist.

Who's wrong? Is it the superproject, when it says "use commit 298e62daa64923b7bc1e4a085233529f907ba7bf"? Or is it the submodule, when it says "commit 298e62daa64923b7bc1e4a085233529f907ba7bf does not exist (yet)"? Or maybe even both are wrong. Well, sort of. The submodule repository is a repository, so it should be self-contained and have everything.

Where things can go wrong

Git is a distributed version control system, meaning there are many copies of every repository. Every clone is a copy, after all. But some clones might be more up to date than others. Suppose someone controlling this ringojs-fork forgot to run git push to update the GitHub clone. Then that someone might have commit 298e62daa64923b7bc1e4a085233529f907ba7bf, but have never sent it out for everyone else. In that particular case, you just need to get whoever controls this ringojs-fork to send that commit up, so that you can get it back down.

Or, perhaps 298e62daa64923b7bc1e4a085233529f907ba7bf existed at one time, and was available for everyone to use. It might even be out there in some clones. But something terrible was in 298e62daa64923b7bc1e4a085233529f907ba7bf, so whoever controls the ringojs-fork had it carefully excised. In other words, it was there, but isn't any more, and no one should ever ask for it. (This situation is problematic since other people might depend on it, or have it and put it back, or at least try to put it back. It's rarely good to rip commits out of public repositories like this.) In this particular case, we see that your Ant repository—your superproject—depends on it.

Well, more precisely, at least one specific commit in your superproject depends on this commit in the submodule. Maybe other commits in the superproject don't, in which case, if you switch to one of those other commits in the superproject, maybe that will cure the problem. No commit can ever be changed, so the particular commit in the superproject that you are using right now will always ask for this other particular commit (by hash ID) in the submodule. If the submodule's commit should never be asked-for, this particular commit in the superproject should simply never be used.

Or, there's one more possibility. I mentioned above that a repository is normally completely self-contained. There's a special exception, though, that's very common, called a shallow clone. A shallow clone is a clone that deliberately omits a lot of commits, so as to make cloning faster.

Omitting lots of commits to make cloning faster is great, up until someone asks for one of those commits. Now, Git is not totally stupid—it is only just mostly stupid —so a shallow clone that omits some commit, also omits any name for that commit. But superprojects don't ask for commits by name, they ask directly by raw hash IDs. This means that if you make shallow clones of submodules, it's really easy for the superproject to call for a commit that was not copied in the clone step.

The root of the problem is that a superproject lists a submodule's commit by raw hash ID, and the two Git repositories are only loosely coupled. Anything that causes a commit in the superproject repository to list a hash ID that is not available, for whatever reason, in the submodule repository, will lead to this error. That includes the occasional failure to push, but mostly includes cases of shallow cloning.

(Note that if a submodule moves, from one GitHub URL to another for instance, you may have to adjust the submodule clone, which may list the old URL, in the same way you have to adjust the superproject's URL if the superproject's URL changes.)

Changing Git Submodule Commit

1 Answers1

Background

Submodules

Where things can go wrong