47

I have such directories structure on server 1:

  • data
    • company1
      • unique_folder1
      • other_folder
      • ...
    • company2
      • unique_folder1
      • ...
    • ...

And I want duplicate this folder structure on server 2, but copy only directories/subdirectories of unique_folder1. I.e. as result must be:

  • data
    • company1
      • unique_folder1
    • company2
      • unique_folder1
    • ...

I know that rsync is very good for this. I've tried 'include/exclude' options without success.

E.g. I've tried:

rsync -avzn --list-only --include '*/unique_folder1/**' --exclude '*' -e ssh user@server.com:/path/to/old/data/ /path/to/new/data/

But, as result, I don't see any files/directories:

receiving file list ... done
sent 43 bytes  received 21 bytes  42.67 bytes/sec
total size is 0  speedup is 0.00 (DRY RUN)

What's wrong? Ideas?


Additional information: I have sudo access to both servers. One idea I have - is to use find command and cpio together to copy to new directory with content I need and after that use Rsync. But this is very slow, there are a lot of files, etc.

Andron
  • 6,413
  • 4
  • 43
  • 56

4 Answers4

43

I've found the reason. As for me - it wasn't clear that Rsync works in this way.
So correct command (for company1 directory only) must be:

rsync -avzn --list-only --include 'company1/' --include 'company1/unique_folder1/***' --exclude '*' -e ssh user@server.com:/path/to/old/data/ /path/to/new/data

I.e. we need include each parent company directory. And of course we cannot write manually all these company directories in the command line, so we save the list into the file and use it.


Final things we need to do:

1.Generate include file on server 1, so its content will be (I've used ls and awk):

+ company1/  
+ company1/unique_folder1/***  
...  
+ companyN/  
+ companyN/unique_folder1/***  

2.Copy include.txt to server 2 and use such command:

rsync -avzn                                        \
      --list-only                                  \
      --include-from '/path/to/new/include.txt'    \
      --exclude '*'                                \
      -e ssh user@server.com:/path/to/old/data/    \
      /path/to/new/data
Richard
  • 4,341
  • 5
  • 35
  • 55
Andron
  • 6,413
  • 4
  • 43
  • 56
  • Hey Andron, Is there a reason you used the triple asterisks? I've experimented with both two and three and I can't tell a difference. And I'm using this technique to backup some files right now, thanks for posting it. – Chad von Nau Jul 11 '13 at 00:01
  • 5
    Nevermind, I figured it out. I was doing `folder**` instead of `folder/***`. You need the third asterisk when you use a slash after the directory name. The two asterisks and no slash method also works, but is less precise, because it will also match peer folders with the same base name. – Chad von Nau Jul 11 '13 at 00:27
  • @ChadvonNau hmm, not sure why I've used `***`. In [RSync docs](http://linux.die.net/man/1/rsync) I see `use '**' to match anything, including slashes`. So maybe 2 asterisks is enough. But I think that 3 is better :) – Andron Jul 11 '13 at 07:48
  • 1
    Also consider this http://unix.stackexchange.com/a/42691/37431 if you want to exclude top most directory – rofrol Oct 11 '13 at 17:28
  • Are `-n` and `--list-only` for testing here? I'm an rsync newbie and had no idea why the command does nothing but listing. – Kagami Sascha Rosylight Aug 25 '18 at 08:38
  • 5
    Regarding three a asterisks; The Rsync man page defines... `trailing "dir_name/***" will match both the directory (as if "dir_name/" had been specified) and everything in the directory (as if "dir_name/**" had been specified). This behavior was added in version 2.6.7` – Dogsbody Jan 20 '19 at 21:54
33

If the first matching pattern excludes a directory, then all its descendants will never be traversed. When you want to include a deep directory e.g. company*/unique_folder1/** but exclude everything else *, you need to tell rsync to include all its ancestors too:

rsync -r -v --dry-run                       \
    --include='/'                           \
    --include='/company*/'                  \
    --include='/company*/unique_folder1/'   \
    --include='/company*/unique_folder1/**' \
    --exclude='*'

You can use bash’s brace expansion to save some typing. After brace expansion, the following command is exactly the same as the previous one:

rsync -r -v --dry-run --include=/{,'company*/'{,unique_folder1/{,'**'}}} --exclude='*'
yonran
  • 18,156
  • 8
  • 72
  • 97
  • 2
    Thanks @yonran, as you can see below - 'include list' is too huge. That's why the list was placed in a file (please see an accepted answer below). And thanks for "bash’s brace expansion" - need to give it a try. – Andron Nov 07 '14 at 06:54
  • This answer is valid, although, if we're using bash features, then we're entering a grey area :) In this case, it's worth noting that a simple `shopt -s globstar; rsync -avn --relative /sourcepath/./**/a destpath` will do. – Marcus Feb 13 '21 at 15:59
8

An alternative to Andron's Answer which is simpler to both understand and implement in many cases is to use the --files-from=FILE option. For the current problem,

rsync -arv --files-from='list.txt' old_path/data new_path/data

Where list.txt is simply

company1/unique_folder1/
company2/unique_folder1/
...

Note the -r flag must be included explicitly since --files-from turns off this behaviour of the -a flag. It also seems to me that the path construction is different from other rsync commands, in that company1/unique_folder1/ matches but /data/company1/unique_folder1/ does not.

pip
  • 213
  • 2
  • 6
  • 1
    This method was a lot easier for me, as it allowed the use of `find` to generate the list of directories to include. – Sam R May 20 '21 at 11:08
3

For example, if you only want to sync target/classes/ and target/lib/ to a remote system, do

rsync -vaH --delete --delete-excluded --include='classes/***' --include='lib/***' \
      --exclude='*' target/ user@host:/deploy/path/

The important things to watch:

  • Don't forget the "/" from the end of the pathes, or you will get a copy into subdirectory.
  • The order of the --include, --exclude counts.
  • Contrary the other answers, starting with "/" an include/exclude parameter is unneeded, they will automatically appended to the source directory (target/ in the example).
  • To test, what exactly will happen, we can use a --dry-run flags, as the other answers say.
  • --delete-excluded will delete all content in the target directory, except the subdirectories we specifically included. It should be used wisely! On this reason, a --delete is not enough, it does not deletes the excluded files on the remote side by default (every other, yes), it should be given beside the ordinary --delete, again.
peterh
  • 11,875
  • 18
  • 85
  • 108
  • thanks. That is a good idea. But in my case, you can see that I had the same subdirectory name in X different directories. So, I'm not sure if that is possible to do with your idea. – Andron Apr 13 '19 at 07:51
  • @Andron It is true. I think, the `--include` paramlist should be modified, maybe to an `--include='***/dirName/'` or similar. I did not test it, but my example is from a real, tested, working deploy script. – peterh Apr 14 '19 at 16:26
  • 1
    This answer doesn't actually answer the question. The suggested `--include='***/dirName/'` doesn't work as intended. – Marcus Feb 13 '21 at 15:53
  • @Marcus I am used it in early 2019, and I am using it now, and it works as intended. Could you please explain, exactly what won't work by you? – peterh Feb 13 '21 at 15:56
  • Sample follows; nothing is synced: ```cd /tmp; mkdir -p data/company{1,2}/{unique_folder1,other_folder}; touch data/company{1,2}/{unique_folder1,other_folder}/testfile; tree data; rsync -vaH --include='***/unique_folder1/' --exclude='*' data/ dest``` – Marcus Feb 13 '21 at 16:10
  • I am getting nothing to sync as well using the exact example above just modified for my directories. Any ideas? – Joseph Astrahan Jul 23 '22 at 12:11