`find -name` with regex pattern and filename replacement using `cp`

Question

Currently I'm using the command in cron to make copy of *.data from source to target path:

find /source_path -name *.data -exec cp {} /target_path \;

The source structure is:

    /source_path/category1/001.data
    /source_path/category1/002.data
    /source_path/category2/003.data
    /source_path/category3/004.data
    /source_path/categorya/005.data
    /source_path/categoryb/006.data

After the above cron command, the target will contain:

    /target_path/001.data
    /target_path/002.data
    /target_path/003.data
    /target_path/004.data
    /target_path/005.data
    /target_path/006.data

I need a one-line solution to replace my current cron command, so that after execution, the target will contain:

    /target_path/category1_001.data
    /target_path/category1_002.data
    /target_path/category2_003.data
    /target_path/category3_004.data
    /target_path/categorya_005.data
    /target_path/categoryb_006.data

To append sub-directory name as a prefix of the target filename.

Thanks.

`sed -r -e 's/\/source(_path)\/(category.+)\/([0-9]+\.data)/\/target\1\/\2_\3/gm' source_file > target_file` — rock321987, Apr 04 '16 at 07:29
What does this code do? I'm not familiar with sed, but it looks like substitution to me. Does it traverse the source path and make copy from source to target? Also, `category` could be anything, containing dashes, underscore, etc, not just beginning with the word category. Thanks — KDX, Apr 04 '16 at 07:37
You got it right..It is substitution.I am assuming that these paths are in a file and changing it and saving it in another file..Do you want this or anything else? — rock321987, Apr 04 '16 at 07:40
I think you are trying to copy the files and rename it..Is it so? — rock321987, Apr 04 '16 at 07:42
Actually these are actual files and paths in the file system. That's why there was a `cp` command involved. I need to keep the original source untouched, while making a copy of each .data file to the target path with category_name appending as filename prefix. I'm thinking if it's possible to store the matching category into a variable and then use it with the `cp` command? — KDX, Apr 04 '16 at 07:48
Possible duplicate of [Looping over pairs of values in bash](http://stackoverflow.com/questions/28725333/looping-over-pairs-of-values-in-bash) — tripleee, Apr 04 '16 at 09:33

Jay jargot · Accepted Answer · 2016-04-04T14:09:46.617

Check this command which only prints strings:

$ find /source_path -name \*.data  | while read -r filename; do printf "print version: cp %s %s\n" "${filename}" "$(printf "%s\n" "${filename}" | sed "s/^.*[/]\(category[^/]*\)[/]\(.*[.]data\)$/\/target_path\/\1_\2/")"; done

find command prints the filenames found, one per line.

read -r filename read one line of text and store it into filename variable.

find ... | while read -r filename all together, write a list of filenames, one per line, into the pipe. Only one filename is read at a time. For each filename read, the command into the while block is executed.

The sed command changes a pathname /source_path/category1/001.data into /target_path/category1_001.data.

I tried my best to explain the string argument of sed in the lines below, but if you are interresting in these topics you should read:

s/ is the search and replace sed command and it is followed with 3 elements: "s/regex pattern/replacement/flag"

^ at the very start means, start of the line.

. means any one char.

* means 0 or infinite number of the char specified just before.

[/] means one char, the char /. [] are used to escape / otherwise it is interpreted as a delimiter between regex pattern, replacement, and flag.

Alltogether ^.*[/], means a line starting with any zero or more chars. This starting sequence must end with /.

[^/] means one char, ^ at start means not part of the char listed. So, it means any one char except the /.

[abc] between [], means one char: either a either b either c.

The first \(.*\) encountered in the regex pattern can be referenced with \1 in replacement. The second \(.*\) encountered in the regex pattern can be referenced with \2 in replacement. etc. Without \ escape char, ( means a single char (, and the content cannot be referenced.

When done use cp instead to effectively copy the files:

find /source_path -name \*.data  | while read -r filename; do cp "${filename}" "$(printf "%s\n" "${filename}" | sed "s/^.*[/]\(category[^/]*\)[/]\(.*[.]data\)$/\/target_path\/\1_\2/")"; done

This solution works for me in the test run in print strings. I haven't performed the actual run yet. I replaced `(category[^/]*\)` with `(.*[^/]*\)` to match any possible character, since the category is actually of different names. Could you give me some quick explanation? Is not that readable to me. What does `while read -r filename` do and interact with `find` or `sed`? Also the regular expression of `[/]`, `[^/]`, and also why are you escaping parenthesis? What are the use of brackets? Thanks. — KDX, Apr 04 '16 at 11:26
Thanks for the in-depth explanations. All understood. How about `^.*` at the beginning? Is it matching zero or more non-characters or zero or more characters? Without \ escape char, what does ( mean? I don't recall the need for escaping parenthesis in other programming languages eg. Perl / PHP — KDX, Apr 04 '16 at 13:15
The answer had been updated. `^.*[/]` means a line starting with zero or more chars and this char sequence ends with `/` -- `(` without the escape char `\ `, means a single char: `(`. — Jay jargot, Apr 04 '16 at 14:07

`find -name` with regex pattern and filename replacement using `cp`

1 Answers1

Linked