0

I hope you can help me with the following problem:

The Situation

  • I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
  • The filenames contain an ID and a string like: "2022-01-11-02 super important file"
  • The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.

What I want to achieve:

  • I want to read out ids.txt line by line.
  • For every line in ids.txt I want to do a find search and copy cp the result to destination.

So far I tried:

  • for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
  • while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt

The output folder remains empty. Not using -exec also gives no (search)results.

I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)

  • As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.

Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?


Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?

In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:

my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output 

EDIT: This is some example content of the ids.txt file where only ids are listed (not the whole filename):

2022-01-11-01
2022-01-11-02
2020-12-01-62

EDIT II: Going on with the solution from tripleee:

#!/bin/bash

grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
   find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done

In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.

Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?

AlexOnLinux
  • 105
  • 4
  • 1
    The `read` loop sounds like the right track. You are missing `\;` in the exec. Did you try `"$ids*"` or `"*$ids*"`? – dan Feb 02 '22 at 12:18
  • yes, i tried them too. The missing `\;` did only miss in the posting here, in the original i used it too. – AlexOnLinux Feb 02 '22 at 14:15

2 Answers2

2

If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.

sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +

The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").

This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.

(Here's a fix for the latter case:

xargs printf '-o\n-name\n%s*\n' <ids.txt |
...

Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.

A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:

xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$@"); find -false ${arr[@]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe

where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)

A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)

while read -r id; do
   find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I dont see the difference between your solution and question, but when using your solution, it does seem work. the only problem, if there are emtpy lines in the ids.txt file that this will be interpreted as empty string + * which in turn copies all files to destination. can we clear any empty lines in ids.txt file beforehand? – AlexOnLinux Mar 01 '22 at 10:56
  • `grep . ids.txt | while read -r id;`... – tripleee Mar 01 '22 at 11:05
  • You mean I should write that line right before the script? what about `< ids.txt` in the final line? I tried adding `grep . ids.txt` prior ` | while [...]` and added to emtpy lines to the ids.txt file. The result is, it is still copying all files. – AlexOnLinux Mar 01 '22 at 11:42
  • Yeah, you'd remove the ` – tripleee Mar 01 '22 at 11:45
  • The main change is that you are running `find` for each item in `ids.txt` which is a huge bottleneck if you have a large disk with many subdirectories. If not, the `while read` variant is probably fine. – tripleee Mar 01 '22 at 11:46
  • i have edited the question (edit II) to show you my bash script. It does not filter emtpy lines from $1 – AlexOnLinux Mar 01 '22 at 11:52
  • If `grep . ids.txt` returns something, that line is not actually empty. Without access to the actual file, we have to assume that you are telling the truth, but clearly that is not actually the case here. – tripleee Mar 01 '22 at 11:55
  • A common beginner error is corrupting your text files with DOS line feeds. [Are shell scripts sensitive to encoding and line endings?](https://stackoverflow.com/questions/39527571/are-shell-scripts-sensitive-to-encoding-and-line-endings) shows you how to clean it up and turn it into an actual text file (though you might as well then remove the empty lines, I suppose). – tripleee Mar 01 '22 at 11:56
  • Adding new requirements in comments is not really acceptable here. Going forward, probably ask a new question instead, perhaps with a link back to the earlier question if it is relevant for background or context. – tripleee Mar 01 '22 at 11:57
  • yeah well, this is not a forum true. but the solutions dont work as expected. I followed your link and did `cat -v` which should show dos-style endings. I also tried `cat -v $1 | while read -r` but it still copies over all files when it reaches one emtpy line. – AlexOnLinux Mar 01 '22 at 12:08
  • I have found the problem. While `grep . ids.txt` did not the trick `grep .. ids.txt` does it. Unfortunately i do not find any explanation under `man grep` or `grep --help` regarding to using dots to exclude emtpy lines. But big thank you for the right track! – AlexOnLinux Mar 01 '22 at 12:09
  • A dot simply matches any character. Again, sounds like you have invisible control characters or other junk in the file but without access to the actual file, we can't tell. What does `grep -m 1 -x . ids.txt | od -tx1` display? – tripleee Mar 01 '22 at 16:21
  • unfortunately i deleted that file during my tests – AlexOnLinux Mar 07 '22 at 11:10
0

Please try the following bash script copier.sh

#!/bin/bash
IFS=$'\n'        # make newlines the only separator
set -f           # disable globbing
file="files.txt" # name of file containing filenames
finish="finish"  # destination directory
while read -r n ; do (
        du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
  );

done < $file

which copies recursively all the files named in files.txt from . and it's subfiles to ./finish This new version works even if there are spaces in the directory names or file names.

  • this does work but not properly. It does not point correctly to every file because of white spaces in folder names. I should have pointed this out in the question. i have spaces in folder and file names. also there are lots of umlauts to encounter – AlexOnLinux Feb 13 '22 at 09:52
  • @AlexOnLinux Please try this new improved version which works even if the are spaces in the directory and file names. Thank you –  Feb 13 '22 at 19:09
  • 1
    Using `cat` in a command substitution is inherently broken; there is no way this can work if the strings in the file contain irregular whitespace. See also [don't read lines with `for`](http://mywiki.wooledge.org/DontReadLinesWithFor) – tripleee Feb 13 '22 at 19:14
  • @triplee - thank you I have modified the reading mechanism and it's working ok –  Feb 13 '22 at 19:25
  • You should still switch to `read -r` to disable the pesky legacy behavior of `read` when it sees a backslash which is a wart from the original Bourne shell which needed to be kept as the default for backwards compatibility. – tripleee Feb 13 '22 at 19:27
  • @triplee flag -r added and the script works as before. I'll have to read up on this. –  Feb 13 '22 at 19:36
  • thx for your reply. unfortunately it does treat ids / files with similar strings as equaly. For example those ids look all the same: 2020-01-02-3 2020-01-02-33 2020-01-02-333 It does not realize that grep should only search for the exact match. The only difference between 2020-01-02-3 and 2020-01-02-33 is the empty space behind the last digit. So grep should be something like `grep $n" "` which does not work for some reason. the search word has to contain the ID + an emtpy space – AlexOnLinux Mar 01 '22 at 10:37