This—combining a non-clone with an actual clone—is difficult in general.
Let's write up a theoretical example, using git://github.com/repo
as the original. Let's assume ssh://example.com/copy.git
will represent the repo you set up using the following command sequence:
<download tarball or zip file from github.com/repo>
<extract tarball or zip file into directory D>
$ cd D
$ git init
$ git add .
$ git commit -m initial -m "" -m "imported from github.com/repo.git"
after which you created the --bare
repository that lives at ssh://example.com/repo.git
from this independent repository.
It's now some time later and you have realized that you would like to be working with an actual clone of github.com/repo.git
. Alas, your ssh://example.com/repo.git
has no shared history—no commits in common—with git://github.com/repo.git
. Running:
$ git clone ssh://example.com/repo.git combine
$ cd combine
$ git remote add public git://github.com/repo.git
$ git fetch public
gets you all of the public commits, but trying to merge public/master
with your own private master
is a mess.
In some very specific cases, it's actually not too hard to fix this. The trick lies in comparing the root commit now sitting in your combine
repository, reachable from your master
, to all the commits in your combine
repository reachable from all the public/*
remote-tracking names. If you are lucky, exactly one commit's tree
exactly matches your own root commit's tree
because the tarball-or-zip-file you got produced an identical tree.
If you are not lucky, there is no such commit. In this case, you can perhaps find a commit that's "sufficiently close". But let's assume that you did find a commit, reachable from public/master
, that exactly matches your own root commit:
A--B--...--o--o <-- master (HEAD), origin/master
\
... (there may be other branches)
C--...--R--...--o <-- public/master
Here, the uppercase letter A
stands in for the actual hash ID of your own root commit—the one you made from the downloaded tarball or zip file—and B
is the commit just after that one. C
stands for the (or some) root commit reachable from public/master
and is mainly in the drawing just for illustration: all we know for certain is that there is at least one more such root (parentless) commit. The letter R
stands in for the commit that exactly matches your commit A
and this is the most interesting commit at the moment.
What we would like to do now is pretend that the parent of the second-most interesting commit, B
, is commit R
rather than commit A
. We can do this! Git has a facility called git replace
. What git replace
does is to copy an object while making some change. In our case, what we want is to copy commit B
to some new commit B'
that looks almost exactly like B
, but has one thing changed: its parent. Instead of listing the hash ID of commit A
as B'
's parent, we want B'
to list the hash ID of commit R
.
In other words, we will have:
A---------B--...--o--o <-- master (HEAD), origin/master
B'
/
C--...--R--...--o <-- public/master
Now all we have to do is convince Git that when it looks up commit B
, it should notice that there's this replacement commit, B'
, and quickly avert its eyes from B
to look instead at B'
. That's the rest of what git replace
does. So having found commits R
and B
, we run:
git replace --graft <hash-of-B> <hash-of-R>
and now Git pretends that the graph reads:
B'-...--o--o <-- master (HEAD), origin/master
/
C--...--R--...--o <-- public/master
(well, Git pretends this unless we run git --no-replace-objects
to see the reality).
The big, or maybe small, drawback
Aside from the rather tough job of locating commit R
—finding A
and B
is very easy, they are the last two hash IDs listed by git rev-list --topo-order master
—this git replace
trick has a flaw. The replacement commit B'
exists in our repository now, but it is located via a special name, refs/replace/hash
, where hash
is the hash ID of the original commit B
. This replacement object (and its name) is not sent to new clones by default.
You can make clones that do have the replacement object and its name, and work with them, and everything works. But this means that every time someone clones your combine
repository, they must run:
git config --add remote.origin.fetch '+refs/replace/*:refs/replace/*'
or similar (this particular rule just slaves your clone's refs/replace/
namespace to origin
's, which is crude but effective).
Alternatively, you can declare a flag day and run git filter-branch
or similar to cement the replacement in place. I have described this elsewhere, though the best I can find at the moment is my answer to How can I attach an orphan branch to master "as-is"? Essentially, you make a new repository that has B'
instead of B
, does not have A
, and has new copies of every commit that is a descendant of B'
(with the same contents except for the parent hash ID). Then you have all of your users switch from the old repo.git
to the new one. This is painful, but only one time.
If you don't plan to keep using the combined repository very long, this may not matter.
Besides the above, you can also use the grafted history to produce merges—Git commands in general will follow the replacements—after which you may not need the replacement graft commit. In this case, the drawback is short-lived: it lasts only until you get your code merged.