214

Let's say I've got a setup that look something like

phd/code/
phd/figures/
phd/thesis/

For historical reasons, these all have their own git repositories. But I'd like to combine them into a single one to simplify things a little. For example, right now I might make two sets of changes and have to do something like

cd phd/code
git commit 
cd ../figures
git commit

It'd be (now) nice to just to perform

cd phd
git commit

There seems to be a couple of ways of doing this using submodules or pulling from my sub-repositories, but that's a little more complex than I'm looking for. At the very least, I'd be happy with

cd phd
git init
git add [[everything that's already in my other repositories]]

but that doesn't seem like a one-liner. Is there anything in git that can help me out?

Will Robertson
  • 62,540
  • 32
  • 99
  • 117
  • Also consider this great approach: http://stackoverflow.com/questions/1425892/how-do-you-merge-two-git-repositories – Johan Sjöberg Oct 01 '13 at 18:19
  • Also consider: https://saintgimp.org/2013/01/22/merging-two-git-repositories-into-one-repository-without-losing-file-history/ – ptim Feb 27 '17 at 16:30
  • The [join-git-repos.py](https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py) script does a nice job if you have separate repositories, each with master branches that you want to combine. – Mark Feb 06 '19 at 19:01

14 Answers14

159

Here's a solution I gave here:

  1. First do a complete backup of your phd directory: I don't want to be held responsible for your losing years of hard work! ;-)

     $ cp -r phd phd-backup
    
  2. Move the content of phd/code to phd/code/code, and fix the history so that it looks like it has always been there (this uses git's filter-branch command):

     $ cd phd/code
     $ git filter-branch --index-filter \
         'git ls-files -s | sed "s#\t#&code/#" |
          GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
          git update-index --index-info &&
          mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' HEAD
    
  3. Same for the content of phd/figures and phd/thesis (just replace code with figures and thesis).

Now your directory structure should look like this:

    phd
      |_code
      |    |_.git
      |    |_code
      |         |_(your code...)
      |_figures
      |    |_.git
      |    |_figures
      |         |_(your figures...)
      |_thesis
           |_.git
           |_thesis
                |_(your thesis...)
  1. Then create a git repository in the root directory, pull everything into it and remove the old repositories:

     $ cd phd
     $ git init
    
     $ git pull code
     $ rm -rf code/code
     $ rm -rf code/.git
    
     $ git pull figures --allow-unrelated-histories
     $ rm -rf figures/figures
     $ rm -rf figures/.git
    
     $ git pull thesis --allow-unrelated-histories
     $ rm -rf thesis/thesis
     $ rm -rf thesis/.git
    

Finally, you should now have what you wanted:

    phd
      |_.git
      |_code
      |    |_(your code...)
      |_figures
      |    |_(your figures...)
      |_thesis
           |_(your thesis...)

One nice side to this procedure is that it will leave non-versioned files and directories in place.


Just one word of warning though: if your code directory already has a code subdirectory or file, things might go very wrong (same for figures and thesis of course). If that's the case, just rename that directory or file before going through this whole procedure:

$ cd phd/code
$ git mv code code-repository-migration
$ git commit -m "preparing the code directory for migration"

And when the procedure is finished, add this final step:

$ cd phd
$ git mv code/code-repository-migration code/code
$ git commit -m "final step for code directory migration"

Of course, if the code subdirectory or file is not versioned, just use mv instead of git mv, and forget about the git commits.

starball
  • 20,030
  • 7
  • 43
  • 238
MiniQuark
  • 46,633
  • 36
  • 147
  • 183
  • 13
    Thanks for this snippet -- it did exactly what I needed (once I accounted for Mac OS X sed not processing "\t" (I had to use ^V^I instead). – Craig Trader May 13 '10 at 22:48
  • Yup, this is exactly the kind of approach I was hoping someone could explain! – Filip Dupanović Apr 19 '11 at 09:54
  • 6
    I couldn't get this to work at first and ultimately found the solution to the problem on another old message board. On the last line, I had to put quotes around the file names like so: `mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD` and then it worked great! – Jorin Aug 18 '11 at 14:24
  • Filenames with non-ascii characters creates problems for git-ls-files. The folder containing such a file will get a double-quote inserted before the foldername. Forcing git-ls-files to use a UTF-8 version of sed might work. – Sharken Oct 12 '11 at 13:30
  • 3
    The funky filter-branch command is from git's filter-branch man pages. You should say that as: a) it should be attributed correctly b) I won't run such a command just because someone, even with high reputation, posted it on StackOverflow. Knowing it's from man pages I will. – tymtam Nov 14 '11 at 02:51
  • If your "old repositories" are nested further down you will need to recurse upward. Suppose your code was initially setup as `phd\code\proj1\.git` instead of `phd\code\.git` In this case, you would do step 2inside `phd\code\proj1` Then you would go to `phd\code` to do step 4, as in: `cd phd\code ; git init ; STEP 4`. After that, you would have `phd\code\.git` and you would repeat STEP 2 and STEP 4 as directed in the answer. – CatShoes Jul 19 '13 at 13:11
  • You rock! This has revealed even more things I didn't know about `git`. I knew about `git-filter-branch`, but this a very interesting use. The `git pull path/to/dir` is also very interesting. In most examples they show you how to do that, but involves adding an additional remote, which is actually unnecessary. Thanks! – Haralan Dobrev May 12 '14 at 22:37
  • 2
    @CraigTrader Consider installing `gnu-sed` (e.g. with `brew`). It smooths out the incompatibilities :) – Haralan Dobrev May 12 '14 at 22:38
  • 5
    WATCH OUT! MacOS X does not use the GNU extension of sed, so it does not know the sequence \t. The result is a messed up history! My solution was to paste the code in a script file a write a real character in it. From the Terminal, a tab can be entered pressing ctrl+v and then writing a . I haven't tried Craig's solution – Gil Vegliach May 19 '14 at 09:32
  • 5
    WATCH OUT (2)! Also notice that if some files or directories contain hyphens ('-') the sed command will fail. In that case you can substitute it with something like 's~\t~&code/~'. Here, applying the same logic, watch out for '~' in names – Gil Vegliach May 19 '14 at 10:10
  • Unable to proceed with Step 2, I get the following message "fatal: can not move directory into itself,..." – SDS Aug 07 '14 at 16:08
  • With the information given, this seems to only work on a very specific case described by OP. It'd be nice to see a little bit clearer information on how to apply this to other people's directory structures. – Translunar Sep 24 '14 at 12:10
  • I needed to upgrade git to > 1.7.7.1 because of http://stackoverflow.com/questions/6463963/git-filter-branch-says-working-tree-is-dirty-when-it-is-not – Clintm Sep 14 '15 at 03:42
  • Another option for Mac OS X `sed` not processing `"\t"` is to use `$(printf '\t')` to insert a tab, as explained here: http://stackoverflow.com/a/28059344/341929 – Marius Butuc Nov 25 '16 at 23:37
  • Since git 2.9, the second `git pull` step will result in `fatal: refusing to merge unrelated histories`. You need to use `--allow-unrelated-histories`, as per [here](https://stackoverflow.com/questions/37937984/git-refusing-to-merge-unrelated-histories). I'll try to edit it in. – Sparhawk Oct 30 '17 at 11:42
  • @GilVegliach I also faced the issue with `sed` on Mac OS. Performed `filter-branch` inside of Ubuntu container `docker run -it -v 'pwd':/repo filiosoft/git bash` – Kirill Nov 03 '17 at 07:09
  • 2
    Stuck on step 2, how to move files into subdir? mv or git mv? Should I commit changes after, because I can't pull it because it says branch has uncommited changes. – m1ld Sep 28 '19 at 07:37
  • @m1ld try performing the `git filter-branch` over a bare repository which you can create with `git clone --bare code/ bare_code/`. – Jaime Hablutzel Sep 07 '20 at 00:06
  • 1
    For optimal results, use [git-filter-repo](https://github.com/newren/git-filter-repo): Replace the long `git filter-branch --index-filter ...` by `git filter-repo --force --to-subdirectory-filter yoursubdirectory` ([reference](https://github.com/newren/git-filter-repo/blob/main/Documentation/converting-from-filter-branch.md#moving-the-whole-tree-into-a-subdirectory)). – Victor Aug 17 '22 at 15:49
  • Those who are getting a warning saying `You have divergent branches and need to specify how to reconcile them.` after running `git pull ...` need to also set the `--no-rebase` flag for the `git pull` command. For more info, see [this answer](https://stackoverflow.com/a/62653694/7734384). – Arad Alvand Oct 18 '22 at 10:32
79

git-stitch-repo will process the output of git-fast-export --all --date-order on the git repositories given on the command-line, and create a stream suitable for git-fast-import that will create a new repository containing all the commits in a new commit tree that respects the history of all the source repositories.

Aristotle Pagaltzis
  • 112,955
  • 23
  • 98
  • 97
  • Ah, thanks! I was hoping there's be a command to do this, but, well, it's sometimes hard to know the full extent of git's features :) – Will Robertson Nov 10 '08 at 05:23
  • 34
    Uh, it’s a third-party tool, not part of git… :-) – Aristotle Pagaltzis Nov 10 '08 at 05:33
  • 1
    Indeed, now you tell me :) Oh well, I suppose I had to learn how to install CPAN packages one day… – Will Robertson Nov 10 '08 at 05:36
  • 1
    Thanks for pointing that command out. Just been using it to help in moving a few repos from SVN to Git. – signine Aug 20 '10 at 13:23
  • 1
    WARNING may not work if you have branches/merges! From the [git-stich-repo](http://p3rl.org/git-stitch-repo) page: "git-stich-repo works perfectly with repositories that have a linear history (no merges). .. The improvements to the stitching algorithm added in version 0.06 should make is suitable to work with repositories having branches and merges." – Bryan P Sep 09 '13 at 15:52
  • How should I use it in Windows? – Jaime Hablutzel Mar 20 '14 at 06:31
  • 6
    This is an external script, the answer is too short and not really helpful, this script has problems with merge commits, not many people would handle Perl or CPAN and this is not well explained in the answer. So... -1, sorry. – Haralan Dobrev May 12 '14 at 22:41
  • 2
    UPDATE 2018: `git-stitch-repo` handles branches and merges fine now. To install it on Mac do `sudo cpan install Git::FastExport` and if the executable is not in your path try looking for it at `/usr/local/bin/git-stitch-repo` – Noah Sussman Aug 03 '18 at 21:20
  • 1
    In 2021, this still works, even through git-stitch-repo seems to have had no updates since 2019. However, I noticed that some of the local repos which I merged into a base repo (ie a repo containing the code and history of several others) got pushed to origin as submodules instead of folders as part of the new base repo. – hamx0r Feb 04 '21 at 17:32
20

You could try the subtree merge strategy. It will let you merge repo B into repo A. The advantage over git-filter-branch is it doesn't require you to rewrite your history of repo A (breaking SHA1 sums).

Leif Gruenwoldt
  • 13,561
  • 5
  • 60
  • 64
20

Perhaps, simply (similarly to the previous answer, but using simpler commands) making in each of the separate old repositories a commit that moves the content into a suitably named subdir, e.g.:

$ cd phd/code
$ mkdir code
# This won't work literally, because * would also match the new code/ subdir, but you understand what I mean:
$ git mv * code/
$ git commit -m "preparing the code directory for migration"

and then merging the three separate repos into one new, by doing smth like:

$ cd ../..
$ mkdir phd.all
$ cd phd.all
$ git init
$ git pull ../phd/code
...

Then you'll save your histories, but will go on with a single repo.

imz -- Ivan Zakharyaschev
  • 4,921
  • 6
  • 53
  • 104
  • This is ok, but if you are merging one repo into another ( i.e. phd was a not empty already existing repo) then if phd had folders with names the same as the subfolders in the code directory you will hit problems as 'git pull ../phd/code' pulls all the commits with the orignal paths and only at the end it applies the mv commit. – tymtam Nov 14 '11 at 02:56
  • 1
    @Tymek: but this will still work in that situation, without problems. The thing that won't be nice is that the paths in the history won't be "correct" (correspond to the new paths). – imz -- Ivan Zakharyaschev Nov 14 '11 at 14:01
9

The git-filter-branch solution works well, but note that if your git repo comes from a SVN import it may fail with a message like:

Rewrite 422a38a0e9d2c61098b98e6c56213ac83b7bacc2 (1/42)mv: cannot stat `/home/.../wikis/nodows/.git-rewrite/t/../index.new': No such file or directory

In this case you need to exclude the initial revision from the filter-branch - i.e. change the HEAD at the end to [SHA of 2nd revision]..HEAD - see:

http://www.git.code-experiments.com/blog/2010/03/merging-git-repositories.html

Gareth
  • 1,252
  • 1
  • 12
  • 7
7

git-stitch-repo from Aristotle Pagaltzis' answer only works for repositories with simple, linear history.

MiniQuark's answer works for all repositories, but it does not handle tags and branches.

I created a program that works the same way as MiniQuark describes, but it uses one merge commit (with N parents) and also recreates all tags and branches to point to these merge commits.

See the git-merge-repos repository for examples how to use it.

Community
  • 1
  • 1
robinst
  • 30,027
  • 10
  • 102
  • 108
5

@MiniQuark solution helped me a lot, but unfortunately it doesn't take into account tags which are in source repositories (At least in my case). Below is my improvement to @MiniQuark answer.

  1. First create directory which will contain composed repo and merged repos, create directory for each merged one.

    $ mkdir new_phd
    $ mkdir new_phd/code
    $ mkdir new_phd/figures
    $ mkdir new_phd/thesis

  2. Do a pull of each repository and fetch all tags. (Presenting instructions only for code sub-directory)

    $ cd new_phd/code
    $ git init
    $ git pull ../../original_phd/code master
    $ git fetch ../../original_phd/code refs/tags/*:refs/tags/*

  3. (This is improvement to point 2 in MiniQuark answer) Move the content of new_phd/code to new_phd/code/code and add code_ prefeix before each tag

    $ git filter-branch --index-filter 'git ls-files -s | sed "s-\t\"*-&code/-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' --tag-name-filter 'sed "s-.*-code_&-"' HEAD

  4. After doing so there will be twice as many tags as it was before doing filter-branch. Old tags remain in repo and new tags with code_ prefix are added.

    $ git tag
    mytag1
    code_mytag1

    Remove old tags manually:

    $ ls .git/refs/tags/* | grep -v "/code_" | xargs rm

    Repeat point 2,3,4 for other subdirectories

  5. Now we have structure of directories as in @MiniQuark anwser point 3.

  6. Do as in point 4 of MiniQuark anwser, but after doing a pull and before removing .git dir, fetch tags:

    $ git fetch catalog refs/tags/*:refs/tags/*

    Continue..

This is just another solution. Hope it helps someone, it helped me :)

MichK
  • 3,202
  • 3
  • 29
  • 33
3

Actually, git-stitch-repo now supports branches and tags, including annotated tags (I found there was a bug which I reported, and it got fixed). What i found useful is with tags. Since tags are attached to commits, and some of the solutions (like Eric Lee's approach) fails to deal with tags. You try to create a branch off an imported tag, and it will undo any git merges/moves and sends you back like the consolidated repository being near identical to the repository that the tag came from. Also, there are issues if you use the same tag across multiple repositories that you 'merged/consolidated'. For example, if you have repo's A ad B, both having tag rel_1.0. You merge repo A and repo B into repo AB. Since rel_1.0 tags are on two different commits (one for A and one for B), which tag will be visible in AB? Either the tag from the imported repo A or from imported repo B, but not both.

git-stitch-repo helps to address that problem by creating rel_1.0-A and rel_1.0-B tags. You may not be able to checkout rel_1.0 tag and expect both, but at least you can see both, and theoretically, you can merge them into a common local branch then create a rel_1.0 tag on that merged branch (assuming you just merge and not change source code). It's better to work with branches, as you can merge like branches from each repo into local branches. (dev-a and dev-b can be merged into a local dev branch which can then be pushed to origin).

3

I have created a tool that make this task. The method used is similar (internally make some things like --filter-branch) but is more friendly. Is GPL 2.0

http://github.com/geppo12/GitCombineRepo

2

The sequence you suggested

git init
git add *
git commit -a -m "import everything"

will work, but you will lose your commit history.

Patrick_O
  • 760
  • 5
  • 13
  • Losing the history isn't so bad, but since the repository is for my own work (i.e., it's private) there's a lot of stuff in there that I don't want versioned or that isn't versioned yet. – Will Robertson Nov 10 '08 at 05:20
1

To merge a secondProject within a mainProject:

A) In the secondProject

git fast-export --all --date-order > /tmp/secondProjectExport

B) In the mainProject:

git checkout -b secondProject
git fast-import --force < /tmp/secondProjectExport

In this branch do all heavy transformation you need to do and commit them.

C) Then back to the master and a classical merge between the two branches:

git checkout master
git merge secondProject
  • This would merge all of the files and folders at the root of both git projects into one project. I doubt _anyone_would want this to happen. – Clintm Sep 11 '15 at 22:20
0

I'll throw my solution in here too. It's basically a fairly simple bash script wrapper around git filter-branch. Like other solutions it only migrates master branches and doesn't migrate tags. But the full master commit histories are migrated and it is a short bash script so it should be relatively easy for users to review or tweak.

https://github.com/Oakleon/git-join-repos

chrishiestand
  • 2,800
  • 1
  • 25
  • 25
0

This bash script works around the sed tab character issue (on MacOS for example) and the issue of missing files.

export SUBREPO="subrepo"; # <= your subrepository name here
export TABULATOR=`printf '\t'`;
FILTER='git ls-files -s | sed "s#${TABULATOR}#&${SUBREPO}/#" |
  GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
  git update-index --index-info &&
  if [ -f "$GIT_INDEX_FILE.new" ]; then mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE; else echo "git filter skipped missing file: $GIT_INXEX_FILE.new"; fi'

git filter-branch --index-filter "$FILTER" HEAD

This is a combination of miniquark, marius-butuc and ryan's posts. Cheers to them!

bue
  • 1
0

I combined 3 git repositories into one manually with the help of Git integration in IntelliJ IDEA Community Edition.

  1. Create a new repo, add a new commit to the master branch with an empty README.md file.
  2. Add three remotes for the new repo, using the name of the 3 repositories and the remote URL of them respectively. Run Git Fetch.
  3. Create a new local branch named temp based on the master branch, so we can start over without pollute the master branch. Checkout the temp branch.
  4. Select to only show commits of one remote branch(one repository).
  5. Select all the commits and right click to Cherry-Pick them.
  6. Create directory structure for this repository, then move the files into it and commit.
  7. Repeat the step 4 to 6 for the other 2 remote branch(repository).
  8. When everything is OK, merge all the changes in the temp branch into master branch.

Then add the origin remote URL for master branch and push to it.

Abelardo
  • 11
  • 1