replace pattern with newline in shell variable

Question

A script save.sh uses 'cp' and outputs its cp errors to an errors file. Mostly these errors are due to the origin filesystem being EXT4 and the destination filesystem being NTFS or FAT and doesnt accept some specia characters.

Another script onerrors.sh reads the error file so as to best manage files that could not be copied : it copies them toward a crafted filename file that's OK for FAT and NTFS = where "bad" characters have been replaced with '_'. That works fine for allmost all of the errors, but not 100%.

In this error file, the special characters in the filenames seem to be multiple times escaped :

simple quotes ' appear as '\''.
\n (real newline in filenames!) appear as '$'\n'' (7 glyphs !)

I want to unescape these so as to get the filename.

I convert quotes back to ' with line=${line//\'\\\'\'/\'}. That's OK.

But how can i convert the escaped newline back to a real unescaped \n in $line variable = how can i replace the '$'\n'' to unescaped \n in variable ?

The issue is not in recognising the pattern but in inserting a real newline. I've not been able to do it using same variable expansion syntax. What other tool is advised or other way of doing it ?

This is XY question. Parsing `cp` output is a very very very bad idea. What will you do if `cp` changes version and changes its output? How many edge cases have you not handled? `I've not been able to do it using same variable expansion syntax.` Because.. it's not possible. You have to yourself _parse_ the string, recognize tokens `'` `$` escape sequences and replace them by characters. That is a full program to write, that reads the string char by char and has internal statemachine. — KamilCuk, Jul 25 '22 at 06:57
? The shell expansion syntax enables to parse the string and recognise both the escaped quote and escaped newline, but the issue is in replacing the escaped newline with a real newline. I'm aware this is not cp-output-changes proof :-/ but how can i better do ? — JLuc, Jul 25 '22 at 07:05
`The shell expansion syntax enables` I would say no, string expansion only replaces text by text. Shell parsing the line parses what you type and recognized quotation and escape sequences. — KamilCuk, Jul 25 '22 at 07:06
You did not post your actual `cp` commands. If you are **not** using the option `-r`, I would simply evaluate the exit code of the `cp` and put all those files for which you have a non-zero exit code into a list ... These are those which could not be copied. You also did not specify, what "special treatment" files undergo which can't be copied. Maybe it would be a better choice in the first place to use `rsync` instead of `cp`, in particular if you copy whole directory trees? — user1934428, Jul 25 '22 at 07:14
Yes i copy whole trees but rsync would have same issue when filename is not ok for FAT or NTSC. The treatments mostly do replace glyphs that are problematic for FAT and NTFS with '_' . Eg `dest=$(echo "$orgfile" | sed -r "s/[:<>'\"\\\?,;#\|\*\n\r]+/_/g" | sed -r "s/[_ \.]+$//")`. There are also some directory paths transforms because origin disk has various roots, when destination disk only has one. — JLuc, Jul 25 '22 at 07:36

KamilCuk · Accepted Answer · 2022-07-25T07:09:00.777

The question is:

how can i replace the '$'\n'' to unescaped \n in variable ?

That's simple:

var="def'$'\n''abc"
echo "${var//\'$\'\\n\'\'/$'\n'}"

I think I remember, that using ANSI C quoting inside variable expansion happened to be buggy in some version of bash. Use a temporary variable in such cases.

What other tool is advised or other way of doing it ?

For string replacement in shell, the most popular tools are sed (which the name literally comes from "String EDitor") and awk. Writing a parser is better done in full-blown programming languages, like Python, C, C++ and similar.

The only way to decode cp output correctly, is to see cp source code, see how it quotes the filenames, and decode it in the same way. Note that it may change between cp flavors and versions, so such a tool may need to query cp version and will be not portable.

Note that parsing cp output is a very very very very bad idea. cp output is in no way standardized, may change anytime and is meant for humans to read. Instead, strongly consider rewriting save.sh to copy file by file and in case of cp returning non-zero exit status, write the filename yourself in an "errors file" as a zero separated stream.

# save.sh
find .... -print0 |
while IFS= read -d '' -r file; do
     if ! cp "$file" "$dst"; then
          printf "%s\0" "$file" > errorsfile
     fi
done

# onerrors.sh
while IFS= read -d '' -r file; do
     echo "do something with $file"
done < errorsfile

Yes it works : the `$'\n'` creates the needed unescaped newline. As for now, the save.sh script is very simple and solid. It's a plain set of cp lines. I like that sturdyness. As simple as it is, it covers the vast majority of the files. The complexity is only in the onerrors.sh script. Also I was afraid that reading and testing each filename, then copying the file itself one by one, would slow the process which allready takes quite a long time, but i might be wrong here and it wont be a big difference. — JLuc, Jul 25 '22 at 07:31
`Where does that $'\n' come from ?` I do not understand, I have written it. For documentation, see Bash manual under ANSI-C quoting style. — KamilCuk, Jul 25 '22 at 07:52

replace pattern with newline in shell variable

1 Answers1