16

I've been following this excellent answer to extract a subdirectory of my git repository into its own repository, while retaining the complete history.

My repository looks like:

src/
    http/
    math/
tests/
    http/
    math/

I want to create a new branch that only contains the src/math and tests/math directories.

If I run the following command:

git subtree split -P src/math -b math

It creates a branch that contains the contents of the src/math directory, but discards the src/math/ prefix.

If I try the same command with two directories:

git subtree split -P src/math -P tests/math -b math

It only extracts the contents of tests/math, ignoring src/math, and also discarding the tests/math prefix.

To summarize, I would like my final repository to look like:

src/
    math/
tests/
    math/

That is, keeping the original directory structure but discarding everything that's not explicitly mentioned in the command-line.

How can I do that?

Community
  • 1
  • 1
BenMorel
  • 34,448
  • 50
  • 182
  • 322
  • 1
    I guess Downvoter did not understand the question. – Agnel Kurian Oct 08 '14 at 10:05
  • this is not exactly a dup of http://stackoverflow.com/questions/2982055/detach-many-subdirectories-into-a-new-separate-git-repository but it does ask for the same result. It's just that here the question is specific to `git subtree split`. I followed the procedure in the first answer and it works like a charm – Hilikus Mar 25 '15 at 21:06
  • It's a bit more work, but could you preserve history by moving each of the subdirectories to a new directory and then splitting that new common parent directory? You would have the extra commit from moving the files, but is that such a bad thing? No changes to prior commit hashes... – cowboydan Aug 10 '15 at 15:01
  • @cowboydan: Does `git subtree` maintain the rename history from outside of that directory? I’m not certain, though I know that e.g. `git filter-branch` doesn’t. If not, that would effectively cause this to eliminate the history. – Jeremy Caney Dec 20 '19 at 12:06
  • 1
    @cowboydan: FYI: I just confirmed that `git subtree split` does _not_ maintain the rename history from outside of that directory, so this approach would be as effective as just copying the files to a new repository. It’s also worth noting that `git subtree split` itself rewrites your history, generating new hashes. – Jeremy Caney Dec 20 '19 at 22:49
  • 1
    Does this answer your question? [Detach many subdirectories into a new, separate Git repository](https://stackoverflow.com/questions/2982055/detach-many-subdirectories-into-a-new-separate-git-repository) – Josh Correia Oct 05 '20 at 21:33

3 Answers3

7

Use git-subtree add to split-in

# First create two *split-out* branches
cd /repos/repo-to-split
git subtree split --prefix=src/math --branch=math-src
git subtree split --prefix=test/math --branch=math-test

# Now create the new repo
mkdir /repos/math
cd /repos/math
git init

# This approach has a gotcha:
# You must commit something so "revision history begins",
# or `git subtree add` will complain about.
# In this example, an empty `.gitignore` is commited.
touch .gitignore
git add .gitignore
git commit -m "add empty .gitignore to allow using git-subtree"

# Finally, *split-in* the two branches
git subtree add --prefix=src/math ../repo-to-split math-src
git subtree add --prefix=test/math ../repo-to-split math-test

It worked for me with git --version 2.23.0. Also note that you can setup different prefixes at split-in time, i.e. add the src/math/ to src/ and test/math/ to test/.

Side note: use git log at the new repo before commiting to a remote, to see if resultant history is ok enought for you. In my case I have some commits with duplicated messages, because my repo history was so dirty, but it's ok for me.

Source

laconbass
  • 17,080
  • 8
  • 46
  • 54
  • 1
    You will get duplicates no matter how clean your commit history is because each time you call `git subtree split` it is rewriting your commit history for that branch. In this example, if you had ten commits that touched both `src/math` _and_ `test/math`, those will now become _twenty_ commits. Worse, each of those commits will, of course, be limited to modifications in one folder. That’s why it’s desirable instead to use something like `git filter-branch` (or, better yet, the third-party `git filter-repo`) so that you can include multiple folders in a single rewrite operation. – Jeremy Caney Dec 20 '19 at 22:40
  • If you don't care about duplicate commit history as @JeremyCaney pointed out above, this is by far, the simplest correct answer. – LoRdPMN Nov 06 '20 at 19:53
  • This is the best answer for me. Other approaches are less intuitive, or inefficient. – Sridhar Sarnobat Sep 22 '21 at 23:17
5

Depending on your needs you might get away with git filter-branch.

I'm not entirely sure what you are trying to achieve, but if you merely want to have a repository with two directories removed (in the history?) this is probably your best shot.

See also Rewriting Git History.

$ git filter-branch --tree-filter 'rm -rf tests/http src/http' --prune-empty HEAD

This will look into each commit and remove the two directories from this commit. Be aware that this rewrites history (i.e.: alters your commit sha) and will cause headaches if you have a common history with another repository.

Alexander Oh
  • 24,223
  • 14
  • 73
  • 76
  • Actually, my example was an oversimplification, and I have many more folders than just `http` and `math`. Is there a way I can just specify which ones to keep, and not which ones to delete? – BenMorel Aug 29 '14 at 21:34
  • in essence you can put a bash script in there. can `rm` deal with what you'd like to keep -> nope. can bash do, yes. try to figure out how to use `find` to match your requirements. http://www.gnu.org/software/findutils/manual/html_mono/find.html – Alexander Oh Aug 29 '14 at 21:44
  • furthermore you can use regular expressions to match your directories. it's **simpler** and **faster** to tell `rm` what to delete. – Alexander Oh Aug 29 '14 at 21:46
  • http://www.linuxjournal.com/content/bash-extended-globbing this might also be helpful to have more powerful file globbing in bash. – Alexander Oh Aug 29 '14 at 22:06
  • I tried your solution by listing all the directories I want to delete. While it works fine, it keeps all the commits in the history, most of them being empty (no modified files). So it's very different from `git subtree split`, and unfortunately it's not what I'm looking for! – BenMorel Aug 30 '14 at 11:07
  • 1
    have you tried to read the manual of `git filter-branch` ? There is a `--prune-empty` to get rid of those commits. – Alexander Oh Aug 30 '14 at 11:13
  • http://stackoverflow.com/questions/5324799/git-remove-commits-with-empty-changeset-using-filter-branch also related SO question. – Alexander Oh Aug 30 '14 at 11:25
  • I've tried `--prune-empty`, which does 90% of the job, but there are still some unrelated commits left: those which were only deleting files outside of the relevant directories! – BenMorel Aug 30 '14 at 11:29
  • you could use `git rebase --interactive` and drop the 10%. I suppose this is a one time thing anyway. – Alexander Oh Aug 30 '14 at 11:33
  • Seems possible to achieve this without destroying history. See `git-subtree add` – laconbass Oct 06 '19 at 03:59
5

Use git-filter-repo This is not part of git as of version 2.25. This requires Python3 (>=3.5) and git 2.22.0

git filter-repo --path src/math --path tests/math 

For my repo that contained ~12000 commits git-filter-branch took more than 24 hours and git-filter-repo took less than a minute.

Kishore A
  • 1,293
  • 3
  • 16
  • 21