54

I have a Git repo converted from SVN to Mercurial to Git, and I wanted to extract just one source file. I also had weird characters like (an encoding mismatch corrupted Unicode ä) and spaces in the filenames.

How can I extract one file from a repository and place it at the root of the new repo?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
peterhil
  • 1,536
  • 1
  • 11
  • 18
  • 1
    It's all I need. And btw, http://stackoverflow.com/questions/5998987/splitting-a-set-of-files-within-a-git-repo-into-their-own-repository-preserving is not a clone of any subdirectory-filtering question. Extracting files requires both --subdirectory-filter step and a --index-filter or --tree-filter. – peterhil Sep 11 '11 at 00:44
  • 1
    Or rather all I want, because I'll make a package of the single file which provides a trie. I want to use it in other projects too, and publish in Github and I have some code in the repo which I don't want to make open source (at least yet). – peterhil Sep 11 '11 at 00:54
  • 1
    as of 2.24 (03/07/2019) git-filter-repo is the recommended replacement for git-filter-branch – Stephen Jul 06 '20 at 16:21
  • Related: [How can I split a single file from a git repo into a new repo?](https://stackoverflow.com/questions/39479154/how-can-i-split-a-single-file-from-a-git-repo-into-a-new-repo/). But doesn't cover unicode details. – idbrii Aug 28 '22 at 18:51

6 Answers6

59

A faster and easier-to-understand filter that accomplishes the same thing:

git filter-branch --index-filter '
                        git read-tree --empty
                        git reset $GIT_COMMIT -- $your $files $here
                ' \
        -- --all -- $your $files $here
jthill
  • 55,082
  • 5
  • 77
  • 137
  • 2
    This worked perfectly for me. I added a `--prune-empty` argument to remove any empty commits. – Aaron Jensen Jun 26 '17 at 21:05
  • 1
    @AaronJensen The `--all -- $your $files $here` on the last line gets passed to the `git rev-list` that `filter-branch` runs, so the commits filter-branch sees have already been pruned. That's much faster than making filter-branch pointlessly load the index and run the filter and make new trees and a commit before throwing it all away for commits that didn't touch those files. Still, it doesn't hurt to add it. – jthill Jun 26 '17 at 21:12
  • 3
    How can I apply this to a single branch ? Replacing `-- --all` with `-- branchname` ? – Mr_and_Mrs_D Jan 05 '18 at 13:36
  • @Mr_and_Mrs_D I just renamed the branch I was on to master: `git checkout my_branch && git branch -d master && git branch -m master` – Potherca Jan 19 '18 at 14:46
  • 3
    For me, this kept the commits that touched the file in question but they were all empty, and the file itself was added in its *present* state in the commit that first created the file (i.e. not in the state it actually was at the time). – Mahmoud Al-Qudsi Jun 07 '18 at 18:58
  • Try to avoid `CMD.EXE` for real work, [even Microsoft admits its backslash handling "has raised eyebrows at times. [… and\] Quotation marks are even more screwed up."](https://blogs.msdn.microsoft.com/oldnewthing/20100917-00/?p=12833). @MahmoudAl-Qudsi – jthill Aug 14 '18 at 11:15
  • 2
    I’m not sure why you assumed I used cmd. This was in fact under fish on Linux. – Mahmoud Al-Qudsi Aug 16 '18 at 21:58
  • That would be because cmd.exe is the only shell I'd ever encountered that botches escapes and quoting that badly. Now I know two shells that do it. – jthill Feb 05 '19 at 21:31
  • 1
    ATTENTION! Not following renamings. First you have to pass the list of files to `git --no-pager log --name-only --format='' -- $your $files $here | sort -u` to get every name of the files - you have to use the results as the file list of the command. – bimlas Feb 05 '19 at 22:34
  • 1
    @Mr_and_Mrs_D Replacing `--all` with `branchname` worked for me; I made a new branch just for the purpose. – rjmunro Feb 27 '19 at 11:37
  • I found this link: https://gist.github.com/cyberang3l/6012c82266122e05db33f4cb8dcf598b to extract a folder from a repo. What do you think? Does it make sense to extend this answer to folders? – Dirk Jul 30 '19 at 08:39
  • @Dirk that is a separate question, I would say. After some searching it doesn't seem to exist yet. Please make it and let me know - there is a much simpler answer than that gist. – TamaMcGlinn Aug 07 '20 at 09:03
13

Seems it's not particularly easy, and that's the reason I'll be answering my own question despite many similar questions regarding git [index-filter|subdirectory-filter|filter-tree], as I needed to use all the previous to achieve this!

First a quick note, that even a spell like in a comment on Splitting a set of files within a git repo into their own repository, preserving relevant history

SPELL='git ls-tree -r --name-only --full-tree "$GIT_COMMIT" | grep -v "trie.lisp" | tr "\n" "\0" | xargs -0 git rm --cached -r --ignore-unmatch'
git filter-branch --prune-empty --index-filter "$SPELL" -- --all

will not help with files named like imaging/DrinkkejaI<0300>$'\302\210'.txt_74x2032.gif. The aI<0300>$'\302\210' part once was a single letter: ä.

So in order to extract a single file, in addition to filter-branch I also needed to do:

git filter-branch -f --subdirectory-filter lisp/source/model HEAD

Alternatively, you can use --tree-filter: (the test is needed, because the file was at another directory earlier, see: How can I move a directory in a Git repo for all commits?)

MV_FILTER='test -f source/model/trie.lisp && mv ./source/model/trie.lisp . || echo "Nothing to do."'
git filter-branch --tree-filter $MV_FILTER HEAD --all

To see all the names a file have had, use:

git log --pretty=oneline --follow --name-only git-path/to/file | grep -v ' ' | sort -u

As described at http://whileimautomaton.net/2010/04/03012432

Also follow the steps on afterwards:

$ git reset --hard
$ git gc --aggressive
$ git prune
$ git remote rm origin # Otherwise changes will be pushed to where the repo was cloned from
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
peterhil
  • 1,536
  • 1
  • 11
  • 18
  • 5
    I'm not sure how to follow these instructions, the text of this answer seems to pose several possible routes. I see no procedure. – ThorSummoner Feb 20 '15 at 18:35
  • Maybe you should see the git documentation about filter-branch command and about rewriting history: - http://git-scm.com/docs/git-filter-branch - http://git-scm.com/book/en/v2/Git-Tools-Rewriting-History – peterhil Mar 19 '15 at 19:02
11

Note that things get much easier if you combine this with the additional step of moving the desired file(s) into a new directory.

This might be a quite common use case (e.g. moving the desired single file to the root dir).
I did it (using git 1.9) like this (first moving the file(s), then deleting the old tree):

git filter-branch -f --tree-filter 'mkdir -p new_path && git mv -k -f old_path/to/file new_path/'
git filter-branch -f --prune-empty --index-filter 'git rm -r --cached --ignore-unmatch old_path'

You can even easily use wildcards for the desired files (without messing around with grep -v ).

I'd think that this ('mv' and 'rm') could also be done in one filter-branch but it did'n work for me.

I didn't try it with weird characters but I hope this helps anyway. Making things easier seems always to be a good idea to me.

Hint:
This is a time consuming action on large repos. So if you want to do several actions (like getting a bunch of files and then rearrange them in 'new_path/subdirs') it's a good idea to do the 'rm' part as soon as possible to get a smaller and faster tree.

Roman
  • 707
  • 8
  • 16
  • I also tried it on ubuntu 12.04 and git 1.7.x with the following results: * the permission-denied problem also appears on ubuntu * git 1.7.x didn't do well with the commands I mentioned above (as only 1 file matched it always was renamed to the directory it should be moved in. So I recommend git 1.9.x which I'm running on my windows machine – Roman May 15 '14 at 06:10
  • reworked my post because most of my problems seem to be caused by my non existent bash skills -> using '&&' instead of '|' to combine commands now – Roman May 23 '14 at 06:33
  • The first step does not work for me in git 2.2.1. There is no change to the repo. – xixixao Sep 12 '15 at 21:25
  • I sometimes had this problem too, that the 1st step changed nothing. It always turned out that the mv command didn't move any files because the path didn't match any (note that git doesn't keep any information about empty directories) – Roman Sep 15 '15 at 05:58
  • 1
    The second step removed all files for me (old path was `.`) Why not: `git filter-branch -f --subdirectory-filter new_path -- --all` ? – jan-glx Nov 03 '16 at 23:59
  • Yes, `--subdirectory-filter` would be a good option in general. But it wasn't helpful for me in this case because I had to 1) rearrange the directory structure and 2) select files by wildcard – Roman Dec 06 '16 at 14:13
  • As I understand it, @jan-glx's solution would still work with the wildcard since it applies to the *new* path, i.e. after files have already been moved by the wildcard glob? – Mahmoud Al-Qudsi Jun 07 '18 at 19:01
8

I've found an elegant solution using git log and git am here: https://www.pixelite.co.nz/article/extracting-file-folder-from-git-repository-with-full-git-history/

In case it goes away, here's how you do it:

  1. in the original repo,

    git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > /tmp/patch
    
  2. if the file was in a subdirectory, or if you want to rename it

    sed -i -e 's/deep\/path\/that\/you\/want\/shorter/short\/path/g' /tmp/patch
    
  3. in a new, empty repo

    git am < /tmp/patch
    
Marius Gedminas
  • 11,010
  • 4
  • 41
  • 39
4

The following will rewrite the history and keep only commits that touch the list of files you give. You probably want to do that in a clone of your repository to avoid losing the original history.

FILES='path/to/file1 other-path/to/file2 file3'
git filter-branch --prune-empty --index-filter "
                        git read-tree --empty
                        git reset \$GIT_COMMIT -- $FILES
                " \
        -- --all -- $FILES

Then you can merge that new branch into your target repository, via normal merge or rebase commands according to your use-case.

PowerKiKi
  • 4,539
  • 4
  • 39
  • 47
2

There is a new command git filter-repo nowadays. It has more possibilities and better performance.

See man page for details and project page for installation.

Remove everything except src/README.md and move it to the root:

git filter-repo --path src/README.md
git filter-repo --subdirectory-filter src/

--path selects the single file and --subdirectory-filter moves the contents of that directory to root.

Roman
  • 707
  • 8
  • 16