141

I have a web application that explores other web applications in a particular way. It contains some web demos in a demos folder and one of the demo should now have it's own repository. I would like to create a separate repository for this demo application and make it a subpackage submodule from main repository without losing its commit history.

Is it possible to keep the commit history from the files in a repository's folder and create a repository from it and use it as a submodule instead?

danronmoon
  • 3,814
  • 5
  • 34
  • 56
GabLeRoux
  • 16,715
  • 16
  • 63
  • 81
  • I ahve been searching how to move directory 1 from Git repository A to Git repository B. +1 for the link to the article. – eQ19 Apr 09 '15 at 15:09
  • 4
    Duplicate? http://stackoverflow.com/questions/12514197/convert-a-git-folder-to-a-submodule-retrospectively – naught101 Nov 26 '15 at 02:25
  • Yes this is indeed very similar, solutions differ a little, thanks for sharing this – GabLeRoux Nov 26 '15 at 13:18

4 Answers4

238

Detailed Solution

See the note at the end of this answer (last paragraph) for a quick alternative to git submodules using npm ;)

In the following answer, you will know how to extract a folder from a repository and make a git repository from it and then including it as a submodule instead of a folder.

Inspired from Gerg Bayer's article Moving Files from one Git Repository to Another, Preserving History

At the beginning, we have something like this:

<git repository A>
    someFolders
    someFiles
    someLib <-- we want this to be a new repo and a git submodule!
        some files

In the steps below, I will refer this someLib as <directory 1>.

At the end, we will have something like this:

<git repository A>
    someFolders
    someFiles
    @submodule --> <git repository B>

<git repository B>
    someFolders
    someFiles

Create a new git repository from a folder in an other repository

Step 1

Get a fresh copy of the repository to split.

git clone <git repository A url>
cd <git repository A directory>

Step 2

The current folder will be the new repository, so remove the current remote.

git remote rm origin

Step 3

Extract history of the desired folder and commit it

git filter-branch --subdirectory-filter <directory 1> -- --all

You should now have a git repository with the files from directory 1 in your repo's root with all related commit history.

Step 4

Create your online repository and push your new repository!

git remote add origin <git repository B url>
git push

You may need to set the upstream branch for your first push

git push --set-upstream origin master

Clean <git repository A> (optional, see comments)

We want to delete traces (files and commit history) of <git repository B> from <git repository A> so history for this folder is only there once.

This is based on Removing sensitive data from github.

Go to a new folder and

git clone <git repository A url>
cd <git repository A directory>
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch <directory 1> -r' --prune-empty --tag-name-filter cat -- --all

Replace <directory 1> by the folder you want to remove. -r will do it recursively inside the specified directory :). Now push to origin/master with --force

git push origin master --force

Boss Stage (See Note below)

Create a submodule from <git repository B> into <git repository A>

git submodule add <git repository B url>
git submodule update
git commit

Verify if everything worked as expected and push

git push origin master

Note

After doing all of this, I realized in my case that it was more appropriate to use npm to manage my own dependencies instead. We can specify git urls and versions, see the package.json git urls as dependencies.

If you do it this way, the repository you want to use as a requirement must be an npm module so it must contain a package.json file or you'll get this error: Error: ENOENT, open 'tmp.tgz-unpack/package.json'.

tldr (alternative solution)

You may find it easier to use npm and manage dependencies with git urls:

  • Move folder to a new repository
  • run npm init inside both repositories
  • run npm install --save git://github.com/user/project.git#commit-ish where you want your dependencies installed
GabLeRoux
  • 16,715
  • 16
  • 63
  • 81
  • 45
    Step "Clean " should be avoided. Doing this you cannot fully restore/checkout older versions/commits from your history. You should just git rm the folder and add the submodule. So you ensure to have a fully working copy when checking out older commits. – Cybot Feb 03 '14 at 09:44
  • 1
    Shouldn't you do `cd someLib` before Step 2? You say "The current folder will be the new repository" but actually it will not; the new repository (submodule) is *inside* that folder. – Jago Oct 13 '15 at 18:34
  • Not exactly, as `someLib` is actually ``, next commands passes `` when desired – GabLeRoux Oct 13 '15 at 19:51
  • 1
    confirming: yes, it works for more than one submodule. Thanks a lot for the detailed answer. Also, didn't have to use npm. – Breno Inojosa Oct 20 '15 at 14:32
  • Glad to see it helped and yes indeed, extracting everything into more than one submodules shouldn't be much different. npm was more a personal suggestion as it turned out to be easier to use for my team at the time, I did extract history but used npm instead of submodules. – GabLeRoux Oct 20 '15 at 18:14
  • 2
    I would add [information](http://stackoverflow.com/a/7654880/1218980) about the `refs/original/...` which is created at step 3. – Emile Bergeron Aug 04 '16 at 13:48
  • Could be a git-lfs issue, where are you stuck? Any error? – GabLeRoux Sep 10 '16 at 23:53
  • 7
    GitHub made an article on how to achieve the extraction of a folder into a new repository: https://help.github.com/articles/splitting-a-subfolder-out-into-a-new-repository/ – jrobichaud Feb 13 '18 at 16:15
  • is there an easy or common way how to handle the outsourced folder if it's in a sub-path and not directly in root? Question is related to the new repo but primary to the old one where the new one has to be loaded now. – David Oct 22 '20 at 13:27
  • @David I'm not sure what you mean by _outsourced folder_, but the solutions shared above should apply to no matter where the folders are. Just input the relative path to each of them each time. The `` can be a directory in a directory. – GabLeRoux Oct 22 '20 at 16:52
  • By the way, this question relates to a similar problem: converting a folder into a dependency. Maybe you should stick with the tools that comes with your programming language to solve this. I've added an alternative solution with `npm`, but I'd suggest having a look at how this can be done with your language's package manager. – GabLeRoux Oct 22 '20 at 16:54
  • Thanks @GabLeRoux I used your description already twice for a PHP-project and it was working quite well. The only problem is how to configure that the subrepository is loaded then in the correct sub-directory (hope with these words it's clearer than in my question). I've to adjust one and create another composer.json-file, so probably I've to configure it there. – David Oct 22 '20 at 19:15
  • Just for git I thought there is perhaps a way 'to tell' git that the sub-repository never should be loaded in the root of the parent too. I know for cli there exists the option to assign a directory as parameter but that has to be entered each time and be known by every user. Is there perhaps the option to configure it for git permanent? – David Oct 22 '20 at 19:15
13

The solution by @GabLeRoux squashes the branches, and the related commits.

A simple way to clone and keep all those extra branches and commits:

1 - Make sure you have this git alias

git config --global alias.clone-branches '! git branch -a | sed -n "/\/HEAD /d; /\/master$/d; /remotes/p;" | xargs -L1 git checkout -t'

2 - Clone the remote, pull all branches, change the remote, filter your directory, push

git clone git@github.com:user/existing-repo.git new-repo
cd new-repo
git clone-branches
git remote rm origin
git remote add origin git@github.com:user/new-repo.git
git remote -v
git filter-branch --subdirectory-filter my_directory/ -- --all
git push --all
git push --tags
oodavid
  • 2,148
  • 2
  • 23
  • 26
  • it works fine, except LFS (see answer from ls below) and also Tags: in my case, it recreated whole parent directory in new repository, as tags were created for the whole parent directory. I don't need that – YaP Oct 11 '20 at 23:47
7

GabLeRoux's solution works well except if you use git lfs and have large files under the directory you want to detach. In that case, after step 3 all the large files will remain to be pointer files instead of real files. I guess it's probably due to the .gitattributes file being removed in the filter branch process.

Realizing this, I found the following solution works for me:

cp .gitattributes .git/info/attributes

Copying .gitattributes which git lfs uses to track large files to .git/ directory to avoid being deleted.

When filter-branch is done don't forget to put back the .gitattributes if you still want to use git lfs for the new repository:

mv .git/info/attributes .gitattributes
git add .gitattributes
git commit -m 'added back .gitattributes'
David
  • 5,882
  • 3
  • 33
  • 44
ls.
  • 395
  • 4
  • 13
1

filter-branch has been superceded by filter-repo.

The procedure for splitting a subfolder out using filter-repo is documented here:

https://docs.github.com/en/get-started/using-git/splitting-a-subfolder-out-into-a-new-repository

In step 3 of GabLeRoux's answer use:

git filter-repo --path FOLDER-NAME/
glennr
  • 2,069
  • 2
  • 26
  • 37