5

I found a solution for my question in Windows but I'm using Ubuntu: How to copy a directory structure but only include certain files using Windows batch files?

As the title says, how can I recursively copy a directory structure but only include some files? For example, given the following directory structure:

folder1
  folder2
    folder3
      data.zip
      info.txt
      abc.xyz
    folder4
    folder5
      data.zip
      somefile.exe
      someotherfile.dll

The files data.zip and info.txt can appear everywhere in the directory structure. How can I copy the full directory structure, but only include files named data.zip and info.txt (all other files should be ignored)?

The resulting directory structure should look like this:

copy_of_folder1
  folder2
    folder3
      data.zip
      info.txt
    folder4
    folder5
      data.zip

Could you tell me a solution for Ubuntu?

Community
  • 1
  • 1
Chau Than
  • 110
  • 1
  • 8
  • I'm not entirely sure why this shouldn't be closed as a duplicate of [Bash: copy named files recursively preserving folder structure](https://stackoverflow.com/questions/1650164/) — form 2009 — other than the fact that there's currently a bounty running on this one. – Jonathan Leffler Nov 03 '15 at 04:09

4 Answers4

5
$ rsync --recursive --include="data.zip" --include="*.txt" --filter="-! */" dir_1 copy_of_dir_1

To exclude dir3 regardless of where it is in the tree (even if it contains files that would match the --includes):

--exclude 'dir3/' (before `--filter`)

To exclude dir3 only at at specific location in the tree, specify an absolute path, starting from your source dir:

--exclude '/dir1/dir2/dir3/' (before `--filter`)

To exclude dir3 only when it's in dir2, but regardless of where dir2 is:

--exclude 'dir2/dir3/' (before `--filter`)

Wildcards can also be used in the path elements where * means a directory with any name and ** means multiple nested directories.

To specify only files and dirs to include, run two rsyncs, one for the files and one for the dirs. The problem with getting it done in a single rsync is that when you don't include a dir, rsync won't enter the dir and so won't discover any files in that branch that may be matching your include filter. So, you start by copying the files you want while not creating any dirs that would be empty. Then copy any dirs that you want.

$ rsync --recursive --prune-empty-dirs --include="*.txt" --filter="-! */" dir_1 copy_of_dir_1
$ rsync --recursive --include '/dir1/dir2/' --include '/dir3/dir4/' --filter="-! */" dir_1 copy_of_dir_1

You can combine these if you don't mind that your specified dirs don't get copied if they're empty:

$ rsync --recursive --prune-empty-dirs --include="*.txt" --include '/dir1/dir2/' --include '/dir3/dir4/' --filter="-! */" dir_1 copy_of_dir_1

The --filter="-! */" is necessary because rsync includes all files and folders that match none of the filters (imagine it as an invisible --include filter at the end of the list of filters). rsync checks each item to be copied against the list of filters and includes or excludes the item depending on the first match it finds. If there's no match, it hits that invisible --include and goes on to include the item. We wanted to change this default to --exclude, so we added an exclude filter (the - in -! */), then we negate the match (!) and match all dirs (*/). Since this is a negated match, the result is that we allow rsync to enter all the directories (which, as I mentioned earlier, allows rsync to find the files we want).

We use --filter instead of --exclude for the final filter because --exclude does not allow specifying negated matches with the ! operator.

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
  • Thanks Roger Dahl, This is working fine. However, need a scalable solution. As in real world, we can't include so many folder names in an exclude flag . The files to be included should be parameterised as in the windows solution present above. How can I just include the directories and files needed and exclude all other folders and files,preserving the folder structure at the same time. Secondly, please let me know why a filter rule is needed here as we are including and excluding required files. – ayniam Nov 03 '15 at 05:53
4

I don't have a beautiful one liner, but since nobody else has answered you can always:

find . -name 'file_name.extension' -print | cpio -pavd /path/to/receiving/folder

For each specific file after copying the directories.

(Make sure you're in the original folder first, of course! :) )

user962158
  • 374
  • 1
  • 8
1

Here is a one-liner using rsync:

 rsync -a -f"+ info.txt" -f"+ data.zip" -f'-! */' folder1/ copy_of_folder1/

If you already have a file list, and want a more scalable solution

 cat file.list | xargs -i rsync -a -f"+ {}" -f'-! */' folder1/ copy_of_folder1/
Adam
  • 17,838
  • 32
  • 54
0
cp -pr folder1 copy_of_folder1; find copy_of_folder1 -type f ! \( -name data.zip -o -name info.txt \)  -exec rm -f {} \;
  • first time : copy entirely folder1 to copy_of_folder1
  • second time : erase all files differents from data.zip and info.txt
  • At the end, you have your complete structure with only the file data.zip and info.txt
V. Michel
  • 1,599
  • 12
  • 14
  • Hi Michel, this is not scalable as we need to copy 70 gb of data and again erase most of it. Thanks for the answer – ayniam Nov 06 '15 at 05:35
  • Hi ayniam, you are right, 70 gb is a bit too much for this kind of procedure. But i don't understand why Chan Than need the full structure even if directory are empty. Have a good day. – V. Michel Nov 07 '15 at 19:17