Can I push both of these commits to the "origin" repo and just the first commit to the "prod" repo?
Yes, but not sustainably long-term.
Here's the first set of things you need to know when using Git:
A Git repository is, at its heart, two databases:
One database contains commits and other internal Git objects. Each commit stores, indirectly, all the files that go with that particular commit; we'll talk about this more in a moment; but this means that the objects database holds the files, in a commit-oriented fashion. The objects in this objects database are read-only: nothing can change any object once it's stored, and the database itself is essentially append-only. Retrieving an object from the database requires knowing its "object number" (a big ugly random-looking hash ID).
This first database is the one that's copied (without any changes: nothing in here can be changed) by cloning. The object numbers (especially those for commits) are universally unique: every Git repository everywhere must use the same number for the same commit, and must allocate a new, never-before-used number for any new commit.1
The other database is simpler: it contains names—human readable ones—grouped into things like branch and tag names. Each name stores one and only one object number, which turns out to be all we need. This database is entirely read/write; anyone can change anything in it. (Hosting sites like GitHub always add restrictions so that some random person off the internet can't go into your GitHub account and change or remove all your names. You wouldn't use a hosting site if it didn't do this!)
This second database is not copied by cloning, but is readable by someone making a clone. A git clone
command will select some stuff out of it and make changes to that during cloning, so that they can remember your branch names.
Cloning a Git repository means copying these databases. As noted above, one of them is (necessarily) copied as-is; the other typically gets modified. What were branch names become what I call remote-tracking names.2 For instance, their main
becomes your origin/main
.3
That's all that's required of a repository, and some (e.g., server hosted) repositories may have just that (though all hosting servers add a bunch of non-Git features too). But you can't do any new work in such a repository, so when you clone one, you get a bit more than just the two databases.
This means that every clone initially has every commit,4 but no branches (or at least no branch names—the word branch in Git is so overused as to be practically meaningless without context). Doing work in a repository with no branch names at all, however, is no fun at all. So your clone normally creates one branch name immediately after doing the database setup, before returning control to you. (You can turn this off with --no-checkout
but there's rarely any reason to do that.)
The branch name you get in your clone selects the same commit as the remote-tracking name in your clone, and the remote-tracking name in your clone is built from their branch name, so your main
and your origin/main
name the same commit as their main
and hence your main
and their main
are in sync. This assumes you're having your Git create your main
: you get to choose which of their branch names your Git creates, using the -b
option at git clone
time, but if you don't use -b
, your Git software asks their Git software what name they recommend. (On GitHub, you can set this with the web interface; GitHub call this the "default branch", which is what Git calls it for git clone
as well.)
1Furthermore, Git has to do this without consulting all the other Git repositories in the universe. This is mathematically impossible and the technique that Git uses to approximate it will someday break, but the size of the hash space is big enough that it's not a problem in practice. (We get a huge break as well by not smushing together unrelated repositories, so that we never notice accidental hash collisions in those unrelated repositories: the uniqueness constraint gets relaxed to apply only to related repositories.)
2Git calls these remote-tracking branch names. Old Git documentation rather sloppily called them "remote branches" or other silly phrases. They're not actually branch names at all—they're just your Git repository's memory of some other repository's branch names, transformed—and the word branch is so badly overused in Git that it's better, in my opinion, to just drop the word branch here entirely.
3The pattern here is mostly clear: a branch name doesn't have a remote name like origin
in front, and a remote-tracking name like origin/main
does. But then you get weirdness like git pull origin main
and you might start to wonder why this isn't origin/main
here too. There are historical reasons for that, i.e., no good reason now except that Git has to be compatible with Git-the-way-it-was-used-in-2005, before "remotes" were invented.
4Technically, you get only the reachable commits, and you can snip off many of those with a "shallow" clone as well, but we won't get into these details.
The next things to know about Git
I've already mentioned that the true identity of a commit is its hash ID name. For instance, d420dda0576340909c3faff364cfbd1485f70376
is a particular commit in the Git repository for Git (the link goes to it). It has the same hash ID in every clone of the Git repository for Git, so if you have a clone of the Git repository for Git, you will have this commit in your clone, or else your clone is out of date and you need to run git fetch
to bring it more-up-to-date from a more-up-to-date copy.
The fetch
and push
commands are how we transfer commits. These commits, like all Git objects, are strictly read-only: once created, they can never be changed, and once they've been distributed Out There to other Git repositories it's generally difficult or even impossible to recall them (there are specific cases where you know how far they've spread and can stamp them out like some evil COVID strain, but given how frequently Git repositories have Git-sex with other related Git repositories, it's usually too late).
The fetch command generally means call up some other Git software and repository and get everything that they have, that I don't. You can limit the fetch somewhat, but that's not the norm. By contrast, the push command generally means call up some other Git software and repository and give them specific commits that I have that they don't. If we push the sex analogy a little too far, this means it's up to the "male" ("push") operations to be responsible.
Now, the thing about commits is that they don't just have a full snapshot of all the files, like a tar or zip or WinRAR archive or whatever. They also carry some metadata, or information about the commit itself. This includes the name and email address of the person who made the commit, for instance. But it also includes stuff that Git relies on internally: specifically, every commit has a list of raw hash IDs for earlier commits.
This list of earlier (or parent) commit hash IDs, stored in each commit, is how commits form history. Most commits contain exactly one hash ID: one parent commit. This forms a simple, backwards-looking, linear chain, and this explains how Git works and why the answer above is "yes but not sustainably".
Let's draw a picture of a tiny, three-commit repository, using uppercase letters to stand in for the commit's hash IDs (which in reality are big and ugly and impossible for humans to work with). We'll call the first commit A
, the second one B
, and the third one C
, and draw them like this:
A <-B <-C
Commit C
stores commit B
's raw hash ID. We say that C
points to B
, and draw that as an arrow sticking out of C
, pointing to B
. What this means is that if we can somehow memorize the hash ID of commit C
, and give that to Git, Git can retrieve commit C
from its all-objects database, and that gives Git the hash ID of commit B
, so that Git can retrieve commit B
too.
Having retrieved both commits—and the two snapshots—Git can go on to compare the two snapshots, to see what changed between them. Git essentially plays Spot the Difference here.
Equally important, now that Git has commit B
in hand, it can use the metadata in B
to get A
's hash ID, from which Git can get commit A
. (Commit A
is special: its list of parents is empty. Git can now stop going backwards.) So by memorizing one hash ID—that for commit C
—we had Git find all the commits.
The second database, holding branch names and other names, is how we have Git do the memorizing for us:
A--B--C <-- main
The name main
holds C
's hash ID, so that it is easy to find commit C
.
If we like, we can now create a second name, such as develop
. We must pick any one of the existing commits so that the name develop
selects that commit. The most obvious candidate is the newest (presumably latest = greatest, right?) commit, C
, so that's the one will probably pick:
A--B--C <-- develop, main
Now we need a way, in our drawing, to know which name we're going to use. This will be our current branch name. To draw which one is current, we'll attach the special all-caps name HEAD
to just one branch name, like this:
A--B--C <-- develop, main (HEAD)
This means we are using main
and therefore using commit C
.
The current branch and commit, and your working tree
Git shares a problem that all version control systems have: if previous versions are frozen for all time (and they are), how do we get any new work done? Git's answer is the usual one: there's an additional area, which Git calls your working tree or work-tree, that holds copies of all the files from the commit you've selected.
So: we use git switch
or the older git checkout
command to pick some particular branch name, and thereby some particular commit, and Git extracts all the files from that commit and puts them into our working tree. We now have all the files from commit C
:
A--B--C <-- develop, main (HEAD)
as we're "on" branch main
and the name main
selects commit C
.
If we run git switch develop
, we get:
A--B--C <-- develop (HEAD), main
Git needs to attach the name HEAD
to develop, remove all the commit-C
files, and plug in, instead, all the commit-C
files. Wait, those are the same files. Git doesn't need to bother to do anything to the files. So it doesn't, and this is (later) an important thing to know: if we switch branch names without switching commits, nothing at all happens in the working tree.
Most version control systems stop here, with the two copies of each file: the frozen one in the current commit, and the usable one in your working tree. Git goes on to add a third copy, in an area that Git gives three names, perhaps because it's so important, or perhaps because the original first name for it was so awful: this is the index or cache or staging area. All three names mean the same thing here; staging area refers to how you use it and is arguably the best name, but I tend to use index because it does extra stuff at various times. We won't get into any of the details here, but I always try to mention it since the index / staging-area is the key to making new commits.
Anyway, let's just assume for now that you know how to make a new commit, and let's go about making a new commit now, which we'll call D
, since that's the next letter. In reality it will get a new, unique hash ID—it will depend on, among all the other details, the exact second at which you make the new commit, so I'd have to know that to predict it—so, well, D
. New commit D
will point backwards to existing commit C
because C
is the commit we're using when we make D
. So let's draw commit D
:
A--B--C
\
D
Oh dear, I left out the branch names. What happened to those? Well, we're not "on" branch main
now, so nothing happens to that name. We are "on" branch develop
now, so Git shoves the new commit's hash ID into that name. The result looks like this:
A--B--C <-- main
\
D <-- develop (HEAD)
Our new commit D
links back to existing commit C
, and our branch name develop
locates commit D
. If we make another new commit E
we get:
D--E <-- develop (HEAD)
/
A--B--C <-- main
where I have for some reason drawn develop
up top this time.
Now let's run git switch main
:
D--E <-- develop
/
A--B--C <-- main (HEAD)
This time, we are changing commits, so Git removes all the commit-E
files and swaps in all the commit-C
files.5 If we look at what we have in our working tree, it's gone back in time to commit C
. (That's also the last commit in the repository we cloned, assuming we cloned a repository to get A-B-C
originally.)
Let's make another new branch name, feature
, now, and switch to that name. We can do this in two steps:
git branch feature
git switch feature
or one:
git switch -c feature
In either case we get:
D--E <-- develop
/
A--B--C <-- feature (HEAD), main
If we now make another two new commits, we get:
D--E <-- develop
/
A--B--C <-- main
\
F--G <-- feature (HEAD)
This is really what commits and branches are about in Git. We make new commits, and Git makes the new commits link backwards to the old ones; as we make each commit, Git stuffs its new, unique hash ID into the current branch name, which is how we remember which commit is the latest.
5Git actually "cheats" here. Git de-duplicates files within and across commits, and because of the way it does that internally, Git knows, instantly, which files are identical in the two commits, when switching. So Git doesn't bother to swap out the files that haven't changed. This is a more general form of the "if we aren't changing commits, don't touch anything" case. It's also quite useful, for when we start making changes but forget to switch branch names first.
This in fact is how branch names are defined
The hash ID stored in a branch name is the last commit on that branch:
D--E <-- develop
/
A--B--C <-- main
\
F--G <-- feature (HEAD)
Here, G
is the last commit on feature
. If we force Git to store D
into develop
and F
into feature
—there are various ways to do that—we get:
E ???
/
D <-- develop
/
A--B--C <-- main
\
F <-- feature (HEAD)
\
G ???
It's as if commits E
and G
no longer exist. We can't find them, because we find commits using branch names and the names no longer point to them. (If we've memorized their hash IDs, we can use those to find them, for a while at least.6) The branch develop
ends at D
now, not at E
. Note that commits A-B-C
are on all three branches.
Since we've never sent commits E
and G
to any other Git repository, we can be sure that they won't be in any other Git repository. Neither are D
and F
yet, but we can now use git push
to send either or both of D
and/or F
to some other Git:
git push origin develop
sends commit D
to the Git repository over at origin
, whatever URL that might be. Then it asks that Git software to create or update its branch name develop
to point to commit D
. Or, if we decide develop
is the wrong name, we can run:
git push origin develop:feature-1
to send D
to them, but ask them to create or update their branch name feature-1
. There's no need to use the same name on both sides—well, except, perhaps, to retain your own sanity. (If you're going to rename this feature-1
, it might be better to do that locally first, then use git push origin feature-1
.)
(Note that we don't have to be "on" any particular branch to git push
it either, and—for reasons we won't get into here—the first time we git push
a branch name we probably want to add the -u
option.)
If we like, we can actually send both D
and F
in one git push
:
git push origin develop feature
which sends the commits we have that that they don't, that they'll need—i.e., D
and F
since they gave us A-B-C
in the first place—and asks them to create or update their develop
and feature
names to remember D
and F
specifically.
6If you can't find a commit in your repository, and you leave things that way long enough, Git may decide that you don't really want it back after all, and can delete it. Any other repository that has it can send it back to you later, after all, and you've indicated your lack of interest now by making it invisible. This is technically a garbage collection operation, done by git gc
. When and whether any particular Git software does this GC is up to that particular Git software.
Multiple remotes
You can have more than one remote. All a remote is, is a short name for a URL, by which your Git:
- remembers that URL for you, and
- when you use
git fetch remote
, calls up that other Git software, gets all the commits they have that you don't, and updates your remote/*
names.
So you can:
git remote add prod <url>
and then git fetch prod
to get from that other Git any commits they have that you don't, and to create or update your prod/*
remote-tracking names.
But: if the repository at that URL is related to the repository at your existing origin
URL, and in fact got all its commits from the repository at your origin
URL, you will already have all those commits. You got them when you got all the commits from origin
. So the set of commits you need is empty. Your Git will still create or update all your prod/*
names, and since this is the first time you've done this with the remote name prod
, that means your Git will create them all.
The thing is, all those commits will literally be the same commits that were on origin
.
This sort of thing means that there's no point in having a separate repository for the production system. Just have one repository, and use different names to select the "most recent" development / test commit, and the "most recent" production commit.
It's certainly possible to do what you're describing. There's just no real reason to bother.