Git track two histories at the same time for multiple origins

Question

Basically what I want is to have a repo for my dev team where they can push their code and make as many commits as they want. Let's say this is the dirty repo. I want to have another clean repo, where maybe once a week I will just pull their code and upload all the changes till now as one commit and name it how I want.

I do not want their history, but I want to keep the changes as a seperate history in the clean repo. I checked out for shallow repos but I don't think it's the same thing. So far I've added a second origin to my project and I do git pull origin master and then git push clean master. But that pushes the history too, I want the same thing, but to track a seperate history.

Does this answer your question? Building on top of repo with dirty and clean origin you can squash-merge multiple commits from dirty as single commit to clean. [Merge (with squash) all changes from another branch as a single commit](https://stackoverflow.com/questions/3697178/merge-with-squash-all-changes-from-another-branch-as-a-single-commit) — blami, Jan 20 '21 at 07:27

score 0 · Answer 1 · answered Jan 20 '21 at 08:35

[...] I've added a second origin to my project and I do git pull origin master and then git push clean master.

I like git pull origin master because then you always get the complete history from your team.

Instead of going straight to git push clean master, you could checkout a separate branch locally, let's call it production, which tracks master on the clean remote. Then, you git merge master to get all new changes, and git rebase -i clean/master to rebase all new commits on top of what you pushed the last time to clean.

While interactively rebasing, you're seeing exactly the commits that you are about to squash into a single commit, and it lets you include the commit message of the commits into your single squashed commit.

After rebasing, you can go ahead and git push clean master.

score 0 · Answer 2 · answered Jan 20 '21 at 09:40

The pieces of Git that are at odds with your desires are these:

Git commits are universally unique. Each commit has a unique hash ID, different from that of every other commit.
A Git repository is, primarily, a collection of commits, which may be shared with other repositories; if two repositories have commits that have the same hash ID, those are literally the same commits.¹
A repository also has branch names: these names are not necessarily shared with any other repository. Each branch name holds one (1) hash ID. These serve to find some particular commit. Git calls these the tip commits of the branches. By definition, such a commit is the last commit in some chain of commits.
History, in a Git repository, is the commits in the repository. This comes about because one of the unalterable components in each commit is the raw hash ID of some earlier commit or commits.

What this means for you is that when you obtain commits from origin, those commits are inviolable and are history; their place in history is wherever they are in the chain that ends at some commit found by some name. Let's say, for concreteness, that you've obtained a commit whose hash ID is a123456 (actual hash IDs are larger and random-looking but this will serve here).

Any repository that has commit a123456 therefore has all the history that leads up to a123456.² If you send commit a123456 to any other repository, it's your responsibility to send all the commits leading up to it, as well.³ That's why you end up sending the entire history to your remote named clean.

What this means for you is that if you want to have a different history in the repository you're calling clean, you must make a different set of commits. These commits, being different, are different history. There's a flaw in this ointment, which is that because these histories are unrelated, management of this new set of commits becomes very painful. You can do it, it's just that Git won't be terribly helpful. (You'll probably want to set up some naming scheme that keeps track of which commits you have already copied, and hence which ones you'll need to copy into clean as different commits. Once they're copied, you will update these "copy-tracking names", whatever you end up calling them, perhaps refs/copied/name to go with refs/heads/name.)

¹This is why two commits that are different, but have the same hash ID, are verboten in a repository. I like to call such commits doppelgängers, with the evil connotation. Git will refuse to add one, modulo any bugs. See also How does the newly found SHA-1 collision affect Git?

²As a soft kind of exception, a shallow clone has some of the history leading up to here, with a cut-off "graft point" at which the repository just sort of shrugs its shoulders and says: I know the commit that comes before this has hash ID H but I have been told not to get H so I don't have it. But any repository that does have the commit with hash ID H has the one that has that place in history, so it's the same history, it's just not viewable from here.

³This rule is not removed for a shallow clone that is sending commits: if the sender wants the history and the sending-clone doesn't have it, the sending-clone can't complete this push. However, when the sender says I'll give you commit a123456 it's possible for the receiver to say ok, I'll take that one, I already have all its parents so no need to send any more history. So you can push from a shallow clone, as long as the clone you're pushing to already has the right history.

Git track two histories at the same time for multiple origins

2 Answers2