1

How can I copy specific files from all directories and subdirectories to a new directory while preserving the original subdirectorie structure?

This answer:

find . -name \*.xls -exec cp {} newDir \;

solves to copy all xls files from all subdirectories in the same directory newDir. That is not what I want.

If an xls file is in: /s1/s2/ then it sould be copied to newDir/s1/s2.

copies all files from all folders and subfolders to a new folder, but the original file structure is lost. Everything is copied to a same new folder on top of each other.

len
  • 749
  • 1
  • 8
  • 23

3 Answers3

1

You can try:

find . -type f -name '*.xls' -exec sh -c \
'd="newDir/${1%/*}"; mkdir -p "$d" && cp "$1" "$d"' sh {} \;

This applies the d="newDir/${1%/*}"; mkdir -p "$d" && cp "$1" "$d" shell script to all xls files, that is, first create the target directory and copy the file at destination.

If you have a lot of files and performance issues you can try to optimize a bit with:

find . -type f -name '*.xls' -exec sh -c \
'for f in "$@"; do d="newDir/${f%/*}"; mkdir -p "$d" && cp "$f" "$d"; done' sh {} +

This second version processes the files by batches and thus spawns less shells.

Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
1

This should do:

# Ensure that newDir exists and is empty. Omit this step if you
# don't want it.
[[ -d newDir ]] && rm -r newDir && mkdir newDir

# Copy the xls files.
rsync -a --include='**/*.xls'  --include='*/' --exclude='*' . newDir

The trick here is the combination of include and exclude. By default, rsync copies everything below its source directory (. in your case). We change this by excluding everything, but also including the xls files.

In your example, newDir is itself a subdirectory of your working directory and hence part of the directory tree searched for copying. I would rethink this decision.

NOTE: This would not only also copy directories whrere the name ends in .xls, bur also recreated the whole directory structure of your source tree (even if there are no xls files in it), and populate it only with xls files.

user1934428
  • 19,864
  • 7
  • 42
  • 87
  • Are you 100% sure this works? `--exclude='*'` excludes also directories... and so `rsync` does not even traverse them. Testing with `rsync version 3.2.7` confirms this here. Plus, even if it was working, it would also sync a directory named `sheets.xls`. – Renaud Pacalet Nov 30 '22 at 11:53
  • @RenaudPacalet : Good point. Just re-tested it. My original test data had been too simple. I thought that due to the `-a` option, `rsync` would not simply prune an excluded directory tree. – user1934428 Nov 30 '22 at 11:56
  • @RenaudPacalet : Fixed it, I think. At least it seems to work here. Please check. I don't understand why it would not copy a directory `...xls`. – user1934428 Nov 30 '22 at 12:04
  • 1
    This creates all directories, even if they do not contain any `xls` file. Probably not what the OP want. And it **would** copy a directory named `sheets.xls`, even if a directory is not a file. – Renaud Pacalet Nov 30 '22 at 12:06
  • Of course it would also call a directory _sheets.xls_. Not sure whether the OP really meant all plain failes named that way, or every directory entry. For clarity, I will put this into my answer. – user1934428 Dec 01 '22 at 07:10
  • Well, they asked _How can I copy specific **files** from all directories..._. – Renaud Pacalet Dec 01 '22 at 07:15
0

Thanks for the solutions.

Meanwhile I found also:

find . -name '*.xls' | cpio -pdm newDir
len
  • 749
  • 1
  • 8
  • 23