-1

I have the following Bash script,

#!/bin/bash

if [ $# -eq 0 ]
then
    echo "Error: No arguments supplied. Please provide two files."
    exit 100
elif [ $# -lt 2 ]
then
    echo "Error: You provided only one file. Please provide exactly two files."
    exit 101
elif [ $# -gt 2 ]
then
    echo "Error: You provided more than two files. Please provide exactly two files."
    exit 102
else
    file1="$1"
    file2="$2"
fi

if [ $(wc -l "$file1" | awk -F' ' '{print $1}') -ne $(wc -l "$file2" | awk -F' ' '{print $1}') ]
then
    echo "Error: Files $file1 and $file2 should have had the same number of entries."
    exit 200
else
    entriesNum=$(wc -l "$file1" | awk -F' ' '{print $1}')
fi

for entry in $(seq $entriesNum)
do
    path1=$(head -n$entry "$file1" | tail -n1)
    path2=$(head -n$entry "$file2" | tail -n1)
    diff "$path1" "$path2"
    if [ $? -ne 0 ]
    then
        echo "Error: $path1 and $path2 do not much."
        exit 300
    fi
done

echo "All files in both file lists match 100%."

done

which I execute giving two file paths as arguments:

./compare2files.sh /path/to/my\ first\ file\ list.txt /path/to/my\ second\ file\ list.txt

As you can see, the names of the above two files contain spaces, and every file itself contain a list of other file paths, which I want to compare line by line, e.g the first line of the one file with the first of the other, the second with the second, and so on.

The paths listed in the above two files contain spaces too, but I have escaped them using backslaces. For example, file /Volumes/WD/backup photos temp/myPhoto.jpg is turned to /Volumes/WD/backup\ photos\ temp/myPhoto.jpg.

The problem is that script fails at diff command:

diff: /Volumes/WD/backup\ photos\ temp/myPhoto.jpg: No such file or directory  
diff: /Volumes/WD/backup\ photos\ 2022/IMG_20220326_1000.jpg: No such file or directory  
Error: /Volumes/WD/backup\ photos\ temp/myPhoto.jpg and /Volumes/WD/backup\ photos\ 2022/IMG_20220326_1000.jpg do not much.

When I modify the diff code like diff $path1 $path2 (without double quotes), I get another kind of error:

diff: extra operand \`temp\'  
diff: Try `diff --help' for more information  
Error: /Volumes/WD/backup\ photos\ temp/myPhoto.jpg and /Volumes/WD/backup\ photos\ 2022/IMG_20220326_1000.jpg do not much.

Apparently the files exist and the paths are valid, but the spaces in paths' names are not handled right. What is wrong with my code and how can be fixed (apart from renaming directories and files)?

avakas
  • 69
  • 6
  • 2
    Run your code through http://shellcheck.net/, and read the links associated with each warning and error. – Charles Duffy Mar 26 '22 at 23:06
  • please update the question with samples from both of your`*list.txt` files – markp-fuso Mar 26 '22 at 23:17
  • @CharlesDuffy You had posted *far* more comments here than was reasonable. When you have that much to say, it's time to write an answer. Comments are not for answering the question; they're for requesting clarifications and/or suggesting improvements to the question. – Cody Gray - on strike Mar 27 '22 at 08:02
  • @CodyGray, I do somewhat regret that those comments were deleted rather than migrated somewhere they could be available for reference in building such an answer. I've added a minimal one covering the most critical point, but I do indeed recall having considerably more to say. – Charles Duffy Mar 27 '22 at 13:43

1 Answers1

0

The title is incorrect: Spaces in your script's arguments are handled correctly. The backslash/space sequences in your input files (as returned in the stdout from head -n 1), by contrast, are not processed as you expect.


Your input files should not contain literal backslashes. Backslashes are only meaningful when they're part of your script itself, parsed as syntax; not when they're part of your data. (That is to say, the string hello\ world in your script or in the command-line arguments given to the shell that calls your script becomes a single string hello world in memory; the backslash guides its parsing, but is not part of the value itself).

Command substitution results do not go through this parsing phase, so the output from head is stored in path1 and path2 exactly as it is (other than removal of the final trailing newline), without backslashes being removed.

If you must process input that contains quote and escape characters, xargs or the Python shlex module can be used to split that input into an array, as demonstrated in Reading quoted/escaped arguments correctly from a string.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441