0

I am trying to remove special characters from specific files in files.txt. I need the mv command to use the full path to write the corrected file to the same location. The source and destination directories both contain spaces.

files.txt

/home/user/scratch/test2 2/capital:lets?.log
/home/user/scratch/test2 2/:31apples.tif
/home/user/scratch/test2 2/??testdoc1.txt

script.sh

#!/bin/bash
set -x
while IFS="" read -r p || [ -n "$p" ]
do
          printf '%s\n' "$p"
          mv "$p" $(echo "$p" | sed -e 's@[^A-Za-z0-9._-/]@_@g')

  done < /home/user/scratch/files.txt

Here is the error that I get:

+ IFS=
+ read -r p
+ printf '%s\n' '/home/user/scratch/test2 2/??testdoc1.txt'
/home/user/scratch/test2 2/??testdoc1.txt
++ sed -e 's@[^A-Za-z0-9._-/]@_@g'
sed: -e expression #1, char 22: Invalid range end
++ echo '/home/user/scratch/test2 2/??testdoc1.txt'
+ mv '/home/user/scratch/test2 2/??testdoc1.txt'
mv: missing destination file operand after '/home/user/scratch/test2 2/??testdoc1.txt'

If I remove the / from sed -e 's@[^A-Za-z0-9._-]@_@g' command it will try to write the file like this:

++ sed -e 's@[^A-Za-z0-9._-]@_@g'
++ echo '/home/user/scratch/test2 2/??testdoc1.txt'
+ mv '/home/user/scratch/test2 2/??testdoc1.txt' _home_user_scratch_test2_2___testdoc1.txt

I have tried changing the delimiter in sed to something other than a / but the issue persists. If I try using mv "$p" "$(echo "$p" | sed -e 's|/[^/]*/\{0,1\}$||;s|^$|/|')" mv errors with this is the same file.

Am I approaching this problem wrong? This feels like it should have been an easier task.

EDIT:

The solution below gives me an issue with the file itself:

' echo '/mnt/data/bucket/Desktop/For_the_New_Director/Part Number Assignment/__Prod_Development/.Memeo 40'\'' flat w:boat plane.xls.plist
/mnt/data/bucket//Desktop/For_the_New_Director/Part Number Assignment/__Prod_Development/.Memeo 40' flat w:boat plane.xls.plist
+ dir='/mnt/data/bucket/Desktop/For_the_New_Director/Part Number Assignment/__Prod_Development'
 = */* ]]/data/bucket/Desktop/For_the_New_Director/Part Number Assignment/__Prod_Development/.Memeo 40' flat w:boat plane.xls.plist
' file='.Memeo 40'\'' flat w:boat plane.xls.plist
+ echo .Memeo '40'\''' flat w:boat $'plane.xls.plist\r'
.Memeo 40' flat w:boat plane.xls.plist
+ echo /mnt/data/bucket/Desktop/For_the_New_Director/Part Number Assignment/__Prod_Development
/mnt/data/bucket/Desktop/For_the_New_Director/Part Number Assignment/__Prod_Development

The actual filename is: .Memeo 40' flat w:boat plane.xls.plist

Why is it changing the filename when trying to do the move?

saleetzo
  • 115
  • 7
  • 1
    `_-/` in square brackets specifies a (invalid) character range. Make the `-` the last character inside the square brackets, specifying a literal `-`. – M. Nejat Aydin Feb 09 '21 at 01:59
  • What's the point of the `|| [ -n "$p" ]` part? – M. Nejat Aydin Feb 09 '21 at 03:03
  • @M.NejatAydin — it deals with the last line of a file that doesn't have a newline as the last character (see [Shell script read missing last line](https://stackoverflow.com/q/12916352/15168). – Jonathan Leffler Feb 09 '21 at 03:24
  • @JonathanLeffler Nice trick! – M. Nejat Aydin Feb 09 '21 at 03:32
  • If your directory contains files named `foo?` and `foo:` would you expect to only have 1 file named `foo` after the script runs? If so, which of the 2 original files should be removed? If not how should that situation be handled - a failure with error message, or incremental numbers to make file names unique, or something else? – Ed Morton Feb 09 '21 at 18:33

1 Answers1

2

There are two problems in your substitution:

  • In the character class description [^A-Za-z0-9._-/], the last part _-/ is interpreted as a range of characters between _ and /, which is invalid. To avoid this, you need to escape the hyphen character with a backslash, or put the hyphen at the beginning or the end of the character class.

  • The directory name test2 2 includes the special character and the sed command converts the directory name into test2_2, which does not exist. Assuming you want to change the filenames only keeping the directory names as is, we need to process the directory names and filenames separately.

Then would you please try the following:

set -x
while IFS= read -r p || [ -n "$p" ]; do
    echo "$p"
    dir=${p%/*}                 # extract directory name
    [[ $p = */* ]] || dir="."   # in case $p does not contain "/"
    file=${p##*/}               # extract filename
    mv -- "$p" "$dir/${file//[^-A-Za-z0-9._]/_}"
done < /home/user/scratch/files.txt
tshiono
  • 21,248
  • 2
  • 14
  • 22
  • I wonder if adjacent special characters should be converted to a single underscore? And whether two underscores in the output should be mapped to one. I'd be inclined to use `s@[^-A-Za-z0-9._/]\{1,\}@_@g` for adjacent specials, and `s/_\{2,\}/_/g` for adjacent underscores. And if your variant of `sed` supports some variation on ERE (extended regular expressions), you can modify that more (`sed -E` is more widely portable than `sed -r` which applies to GNU `sed` only, AFAIK). – Jonathan Leffler Feb 09 '21 at 02:32
  • Alternatively: `mv -- "$p" "$dir/${file//[^A-Za-z0-9._-]/_}"`, instead of `sed` and subprocesses. – M. Nejat Aydin Feb 09 '21 at 02:42
  • This code won't work if the line read doesn't contain a `/` character (a bare filename). – M. Nejat Aydin Feb 09 '21 at 03:00
  • Thank you guys. I've corrected my code accordingly. @JonathanLeffler thank you for the suggestion. That's a good point. Please let me include your idea based on the OP's response. – tshiono Feb 09 '21 at 03:22
  • Oops! I've forgot to drop the `echo` command in my test code. Now it's corrected, Would you please try the revised one? Sorry for bothering. – tshiono Feb 09 '21 at 03:55
  • Thank you for the help! I updated the question with a real world example. It is changing the filename and therefore the `mv` commands doesn't go through. – saleetzo Feb 09 '21 at 03:59
  • Thank you for the feedback. Are you mentioning the tailing `\r` or the backslashes around the `'`? – tshiono Feb 09 '21 at 04:08
  • Both of them. It looks like this would work but it's handling the filename improperly. – saleetzo Feb 09 '21 at 04:10
  • 1
    As for the `\r`, it may be due to the text editor when you edited `files.txt`. Please run `dos2unix` or switch the editor to which handles the line endings properly to remove the trailing `\r` code. As for the backslashes, it does not modify the filename but represents the single quotes properly when printing. The internal filename fed to `mv` command is unchanged and should be correctly handled. – tshiono Feb 09 '21 at 04:20
  • @tshiono thank you again for the help. I am leaving work now, but I will try again tomorrow. I did not have the error on the file when creating `files.txt` on the BSD system. However, it seemed it was removing the spaces in some of the directories. So, `Something_something` would turn in to `Somethingsomething` so the destination was not correct. – saleetzo Feb 09 '21 at 05:09