189

I use the following bash script to copy only files of certain extension(in this case *.sh), however it still copies over all the files. what's wrong?

from=$1
to=$2

rsync -zarv  --include="*.sh" $from $to
user881480
  • 5,005
  • 6
  • 32
  • 31
  • 7
    While not strictly speaking related, I would suggest quoting $from/$to. Not doing so may give you unexpected results if positional arguments 1/2 include spaces. – Kjetil Joergensen Jun 20 '12 at 01:53
  • did you get an understanding why your command wouldn't just work? – Charlie Parker Jun 14 '18 at 00:46
  • @CharlieParker: Do you have to use `rsync`, this can very well be achieved with the shell internals? – Inian Jun 14 '18 at 04:08
  • What this question and its answers also lack is how to craft the command if I have recursive directories that I want to send only one type of file. It seems it only does it for the target directory... – Charlie Parker Jun 14 '18 at 18:29
  • side note: the `-r` is redundant because `-a` implies `-r` – wisbucky Nov 24 '20 at 04:10

6 Answers6

316

I think --include is used to include a subset of files that are otherwise excluded by --exclude, rather than including only those files. In other words: you have to think about include meaning don't exclude.

Try instead:

rsync -zarv  --include "*/" --exclude="*" --include="*.sh" "$from" "$to"

For rsync version 3.0.6 or higher, the order needs to be modified as follows (see comments):

rsync -zarv --include="*/" --include="*.sh" --exclude="*" "$from" "$to"

Adding the -m flag will avoid creating empty directory structures in the destination. Tested in version 3.1.2.

So if we only want *.sh files we have to exclude all files --exclude="*", include all directories --include="*/" and include all *.sh files --include="*.sh".

You can find some good examples in the section Include/Exclude Pattern Rules of the man page

FuePi
  • 1,958
  • 22
  • 18
chepner
  • 497,756
  • 71
  • 530
  • 681
  • 11
    While it'll get you all sub-directories, if there's any .sh files in subdirectories you want to rsync, chances are you'll want to use --include="*/" too. – Kjetil Joergensen Jun 20 '12 at 01:51
  • 61
    I tried this on rsync version 3.0.7, which I got long ago from macports, and it didn't work with this ordering of includes/excludes. This is what I ended up with that worked for me (adapted for OP): `rsync -zarv --include="*/" --include="*.sh" --exclude="*" "$from" "$to"`. – Bijou Trouvaille Jun 03 '13 at 09:17
  • 4
    I tried with rsync 3.0.9 and it did not work. Bijou is right, the ordering is not proper (first `--include=\*.sh` then `--exclude=\*`) – TrueY Nov 28 '14 at 09:26
  • 3
    It doesn't work with your ordering of includes/exludes, but it works with the ordering suggested by Bijou Trouvaille – John Smith Optional Aug 30 '15 at 14:36
  • It doesn't handle `--delete` very well. For example after deleting **a.sh** on source, if I pass `--delete` , it tries to remove everything from destination. How to solve this? – Khurshid Alam Oct 07 '16 at 09:58
  • @BijouTrouvaille's ordering worked for me even with version 3.0.6 and the other ordering does not. – Ben Lindsay Apr 12 '18 at 17:02
  • 3
    why do we need so many includes, looks like a really silly command. – Charlie Parker Jun 14 '18 at 00:35
  • 2
    I think an explanation of what the command is actually doing would be fantastic and really useful. – Charlie Parker Jun 14 '18 at 00:45
  • does it matter where you run the command? I usually use absolute paths everywhere... – Charlie Parker Jun 14 '18 at 02:29
  • How does one craft the command if I have recursive directories that I want to send only one type of file. It seems it only does it for the target directory. – Charlie Parker Jun 14 '18 at 18:30
  • I want to include only one directory and exclude rest of all directory in `/etc/lsyncd/lsyncd.conf.lua` file. Have any idea? – Dhaduk Mitesh May 20 '19 at 07:28
  • exclude '*' is super important:D – John Jiang Jan 02 '20 at 00:26
  • 2
    `rsync` is a really great tool... *except* for this this functionality here. These `--include` and `--exclude` flags are unintuitive and sloppy. – muad-dweeb Apr 13 '20 at 21:27
  • 2
    why is it accepted as a correct answer? It also copies all subdirectories even if they don't contain the goal files – Vyachaslav Gerchicov Apr 16 '20 at 09:39
  • 1
    @VyachaslavGerchicov that can be fixed with `--prune-empty-dirs`. Why that's not a default action I'm not entirely sure, but it isn't so we (almost always) need to include it. – roaima Jul 14 '20 at 12:53
  • This does'nt work if I want to copy just some certain folder i.e. if the pattern represent some folder – Eular Aug 07 '21 at 06:26
  • I am on macos and rsync version 2.6.9, but it still behaves like "rsync version 3.0.6 or higher" – Darko Maksimovic Feb 02 '22 at 23:07
91

The answer by @chepner will copy all the sub-directories whether it contains files or not. If you need to exclude the sub-directories that don't contain the file and still retain the directory structure, use

rsync -zarv  --prune-empty-dirs --include "*/"  --include="*.sh" --exclude="*" "$from" "$to"
Gringo Suave
  • 29,931
  • 6
  • 88
  • 75
rambalachandran
  • 2,091
  • 2
  • 19
  • 34
  • 3
    This was a requirement for me: "If you need to exclude the sub-directories that dont contain the file and still retain the directory structure" +1 – Juuso Ohtonen Nov 22 '17 at 11:41
  • 2
    I don't understand how did you know what the order of the --includes were? – Charlie Parker Jun 14 '18 at 02:44
  • 1
    How does one craft the command if I have recursive directories that I want to send only one type of file. It seems it only does it for the target directory. – Charlie Parker Jun 14 '18 at 18:33
  • Exactly what I needed. Thanks! – Dinesh Shekhawat Jun 24 '21 at 10:01
  • 1
    Also curious about the `--include "*/"` - The FILTER RULES of the rsync man say, "the first matching pattern is acted on" so I can't figure out why this doesn't work if we just use the second include (`--include="*.sh"`) - won't this be the first matching pattern that acts on the file we want? – David Streid Jan 20 '22 at 15:15
40

Here's the important part from the man page:

As the list of files/directories to transfer is built, rsync checks each name to be transferred against the list of include/exclude patterns in turn, and the first matching pattern is acted on: if it is an exclude pattern, then that file is skipped; if it is an include pattern then that filename is not skipped; if no matching pattern is found, then the filename is not skipped.

To summarize:

  • Not matching any pattern means a file will be copied!
  • The algorithm quits once any pattern matches

Also, something ending with a slash is matching directories (like find -type d would).

Let's pull apart this answer from above.

rsync -zarv  --prune-empty-dirs --include "*/"  --include="*.sh" --exclude="*" "$from" "$to"
  1. Don't skip any directories
  2. Don't skip any .sh files
  3. Skip everything
  4. (Implicitly, don't skip anything, but the rule above prevents the default rule from ever happening.)

Finally, the --prune-empty-directories keeps the first rule from making empty directories all over the place.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Jim Hunziker
  • 14,111
  • 8
  • 58
  • 64
  • 1
    Thank you so much for explaining what's going on. Now there's much better chance that I won't forget the command. – MohamedEzz Mar 11 '19 at 19:51
  • 11
    _''The algorithm quits once any pattern matches"_ — this is key, and none of the higher-rated answers explain it as clearly and as up-front as you did here. Of course this _is_ in the man page somehwere, and if I'd read the whole thing carefully, I'd've seen that. Still, thanks. – TheDudeAbides Nov 28 '19 at 01:08
  • 2
    The other key concept is that "when using the --recursive (-r) option (which is implied by -a), every subdir component of every path is visited left to right, with each directory having a chance for exclusion before its content. In this way include/exclude patterns are applied recursively to the pathname of each node". – wisbucky Nov 24 '20 at 04:21
  • *''The algorithm quits once any pattern matches"* -- If this is true then shouldn't `--include "*/"` allow any file to be sync'd that is in any directory? Or does matching a file require matching on a directory pattern AND matching on a file pattern? – FlexMcMurphy Jan 06 '21 at 23:07
  • @FlexMcMurphy - "`a '*' matches any path component, but it stops at slashes.`" – Jim Hunziker Jan 07 '21 at 16:55
  • @Jim Hunziker - That doesn't make any sense to me at all? I need baby English. I don't understand what `*/`does. I think it matches all directories and does not match any files, this was the second part of my last comment. It that is true then I think I understand your command. – FlexMcMurphy Jan 08 '21 at 20:51
  • @FlexMcMurphy I think it means that the star eats up parts of the path but if your pattern has a slash after the star, the slash and things after it still need to be in the path. So the first star in `*/foo/*` won't eat up the `foo` directory you're trying to ensure is in there. – Jim Hunziker Jan 09 '21 at 12:57
20

One more addition: if you need to sync files by its extensions in one dir only (without of recursion) you should use a construction like this:

rsync -auzv --include './' --include '*.ext' --exclude '*' /source/dir/ /destination/dir/

Pay your attention to the dot in the first --include. --no-r does not work in this construction.

EDIT:

Thanks to gbyte.co for the valuable comment!

EDIT:

The -uzv flags are not related to this question directly, but I included them because I use them usually.

Serge Roussak
  • 1,731
  • 1
  • 14
  • 28
  • 1
    how did you know what the order of the flags had to be and what they needed to include? – Charlie Parker Jun 14 '18 at 02:44
  • 1
    @CharlieParker, because the rsync uses the `include` and the `exclude` options in the order which they were specified in. In addition to this, it stops at the first matched one. So, if we specify the `--exclude '*'` at the first place in this example the rsync will do nothing. See the man for more explanations. – Serge Roussak Jun 14 '18 at 15:33
  • can you explain to me what each flag is doing? First flag `-- include './' ` is saying include everything in the source directory path? Then the next one ` --include '.ext'` include the specific file in the source path named `.ext` and then the exclude says don't send anything else `--exclude '*'`? Is that correct? – Charlie Parker Jun 14 '18 at 16:23
  • 1
    How does one craft the command if I have recursive directories that I want to send only one type of file. It seems it only does it for the target directory. – Charlie Parker Jun 14 '18 at 18:30
  • @CharlieParker, the first `include` says to process current directory, the second one -- a files with specified extension and, finally, the `exclude` says to skip all other files and directories. – Serge Roussak Jun 17 '18 at 07:51
  • 1
    Thanks for this! Needs to `--include '*.ext'` and not `--include '.ext'` – gbyte Jan 14 '20 at 12:09
  • if the flags `-u`, `-z` or `-v` are not significant to the answer, consider removing them – CervEd Oct 16 '21 at 11:16
3

Wrote this handy function and put in my bash scripts or ~/.bash_aliases. Tested sync'ing locally on Linux with bash and awk installed. It works

selrsync(){
# selective rsync to sync only certain filetypes;
# based on: https://stackoverflow.com/a/11111793/588867
# Example: selrsync 'tsv,csv' ./source ./target --dry-run
types="$1"; shift; #accepts comma separated list of types. Must be the first argument.
includes=$(echo $types| awk  -F',' \
    'BEGIN{OFS=" ";}
    {
    for (i = 1; i <= NF; i++ ) { if (length($i) > 0) $i="--include=*."$i; } print
    }')
restargs="$@"

echo Command: rsync -avz --prune-empty-dirs --include="*/" $includes --exclude="*" "$restargs"
eval rsync -avz --prune-empty-dirs --include="*/" "$includes" --exclude="*" $restargs
}

Advantages:

short handy and extensible when one wants to add more arguments (i.e. --dry-run).

Example:

selrsync 'tsv,csv' ./source ./target --dry-run
Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
biocyberman
  • 5,675
  • 8
  • 38
  • 50
-1

If someone looks for this… I wanted to rsync only specific files and folders and managed to do it with this command: rsync --include-from=rsync-files

With rsync-files:

my-dir/
my-file.txt

- /*
Pascal Polleunus
  • 2,411
  • 2
  • 28
  • 30