Copy a snapshot of git repo to another repo w/o history

Question

Scenario: I have a private development Git repo with messy history. I have an external Git repo available to customers. I don't want customers to see the complex branches and history. Instead, I want a mostly linear set of commits per release. That way they can see the difference between releases, but not see the craziness it took to get there.

I created 2 repos: private & public. When I reach a release on private, I copy it over to a new directory. Then I copy the .git directory from public repo into this directory. git status says nearly every file has changed. I commit as a massive change from v1.1.1 to v1.1.2. This is all the customers see in the public repo.

This works but is clearly not "the git way". Is there another way to accomplish this task? I want to keep the public repo as clean and simple as possible.

This is what [forks are for](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow) — Liam, Apr 26 '21 at 10:52

score 2 · Answer 1 · answered Apr 25 '21 at 21:37

Instead of keeping two separate repositories, you can keep a "clean" branch and create a new dev branch whenever you work on a new version. Then, you can squash the commits from the dev branch into the clean branch, which will merge the messy history into a single commit, and then push the clean branch to the public repo.

git checkout clean
git merge --squash dev
git commit -m "v1.1.2"

score 2 · Accepted Answer · answered Apr 26 '21 at 10:47

In Git, "history" and "commits" are synonyms. The only tricky part here is that a commit is its hash ID, in a sense, and the hash ID is the checksum (SHA-1 currently; SHA-256 soon) of the complete contents of the commit, including its parent hashes.

What this means is that to build your "clean" public-release history, you must make separate commits. The effect will be the same as what you are doing now, no matter how you go about doing it.

You can however simplify the process. Here is how it would work if you were starting from scratch:

In the initial repository, create two root commits. They might well have identical snapshots (e.g., just a README.md and/or LICENSE.md file, copyright notice, etc.), but they will have different initial timestamps so that they have different metadata, despite both having no parent commit.

Two separate roots is not technically required, but helps to make sure nobody accidentally merges the two disjoint subgraphs. It also matches what you will get if you decide to use this process on what you have today (not starting from scratch).

One branch name (master or main) points to one of these root commits, on which you'll build software as usual. Use a second, separate branch name (e.g., for-user-releases-only) for the other root commit.
Now, on one of these root commits—the one named by master or main—build your internal commit graph. Make branches as you see fit. When something is ready for release, make a specific release branch and a specific release tag whose names indicate that these are the internal release commits.
Then, once there's a tagged release ready to go, check out the other graph-so-far. For the first release that's the other initial (root) commit; for subsequent releases, that's the branch you use for building these. In other words, you'll check out (or maybe even have a git worktree add-ed work-tree that uses) the for-user-releases-only name. Then use git commit-tree, perhaps through a script, to build a commit with:
- the appropriate parent;
- the tree (i.e., snapshot) from the release; and
- an appropriate log message
and make a tag, and maybe even a branch, for this new commit and update the for-user-releases-only branch to point to this commit (this will be a fast-forward operation).

You can now make a single-branch clone of the internal repository whenever you like, using the for-user-release-only branch to give out. This should pick up just the tags that go with commits in this branch.

(This is all untested.)

To do this with your existing scheme, just add one of the release repositories as a remote, in any clone of your internal-use-only repository. Use git fetch from that remote to add the (presumably single) branch from which all releases are made as a remote-tracking name. Create one branch name to go with that one remote-tracking name, then ditch the remote-tracking name and the remote (with git remote remove).

If you're currently using multiple for-users branches, you will need to fancy this up a bit, but the principles still apply. You'll just be storing two independent commit graphs in a single repository, vs what you're doing now: storing two independent commit graphs in two repositories.

Thanks! This is very helpful! It gave me enough info to search for more info. I found this (https://devblogs.microsoft.com/oldnewthing/20190506-00/?p=102478) which does the same thing you recommended. — projectshave, Apr 27 '21 at 18:46

Copy a snapshot of git repo to another repo w/o history

2 Answers2