1

I need to verify that all images mentioned in a csv are present inside a folder. I wrote a small shell script for that

#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'

csvfile=$1
imgpath=$2

cat $csvfile | while IFS=, read -r filename rurl
do
    if [ -f "${imgpath}/${filename}" ]
    then
        echo -n
    else
        echo -e "$filename ${red}MISSING${color_Off}"
    fi
done

My CSV looks something like

Image1.jpg,detail-1
Image2.jpg,detail-1
Image3.jpg,detail-1

The csv was created by excel.

Now all 3 images are present in imgpath but for some reason my output says

Image1.jpg MISSING

Upon using zsh -x to run the script i found that my CSV file has a BOM at the very beginning making the image name as \ufeffImage1.jpg which is causing the whole issue.

How can I ignore a BOM(byte-order marker) in a while read operation?

Belphegor21
  • 454
  • 1
  • 5
  • 24
  • You probably have DOS line endings. See https://stackoverflow.com/q/45772525/1745001. Could also just be trailing white space I suppose. Btw, your error message would be more useful if you also printed the csv name and line number, as well as the file name that it thinks is missing. – Ed Morton Sep 21 '22 at 19:19
  • What does `zsh -x yourscript` say in the logs? As Ed says, it's probably DOS line endings; but that should be visible in trace output. – Charles Duffy Sep 21 '22 at 19:19
  • (as an aside: if an answer that broke on zsh and only worked on bash wouldn't be acceptable, don't use the bash tag; they're very different shells, and not mutually compatible in either direction) – Charles Duffy Sep 21 '22 at 19:20
  • How do you run the script? What's `$imgpath` set to? – Arkadiusz Drabczyk Sep 21 '22 at 19:21
  • I run using the following command `./imageFinder.sh /Users/belphegor21/Documents/image.csv /Users/belphegor21/Documents/Images` – Belphegor21 Sep 21 '22 at 19:32
  • It works well for me. Try doing `dos2unix /Users/belphegor21/Documents/image.csv` as others suggested. – Arkadiusz Drabczyk Sep 21 '22 at 19:36
  • zsh -x gave me the answer. I was not aware of that command. Thanks @CharlesDuffy. My csv was created in excel so maybe something went wrong there. The file has `\ufeff` at the very beginning making the image name as `\ufeffImage1.jpg` which it can't find. Deleting that fixed it. – Belphegor21 Sep 21 '22 at 19:37
  • Ahh, great. That's what's called a byte-order marker. – Charles Duffy Sep 21 '22 at 19:39
  • ...related: [unrecognized character in header of csv](https://stackoverflow.com/questions/69844368/unrecognized-character-in-header-of-csv) – Charles Duffy Sep 21 '22 at 19:40
  • Anyhow -- please either delete the question or [edit] enough information in to let someone else answer it referring only to the question text itself (as far as I know, we don't have a "how do I ignore a byte order marker from a `while read` loop in zsh?" question on the site yet). – Charles Duffy Sep 21 '22 at 19:42
  • I'm guessing that in the zsh -x log it was printed inside `$''` quoting, as in, `$'\ufeffImage1.jpg'`, correct? – Charles Duffy Sep 21 '22 at 20:02

1 Answers1

0

zsh provides a parameter expansion (also available in POSIX shells) to remove a prefix: ${var#prefix} will expand to $var with prefix removed from the front of the string.

zsh also, like ksh93 and bash, supports ANSI C-like string syntax: $'\ufeff' refers to the Unicode sequence for a BOM.

Combining these, one can refer to ${filename#$'\ufeff'} to refer to the content of $filename but with the Unicode sequence for a BOM removed if it's present at the front.

The below also makes some changes for better performance, more reliable behavior with odd filenames, and compatibility with non-zsh shells.

#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'

csvfile=$1
imgpath=$2

while IFS=, read -r filename rurl; do
    filename=${filename#$'\ufeff'}
    if ! [ -f "${imgpath}/${filename}" ]; then
        printf '%s %bMISSING%b\n' "$filename" "$red" "$color_Off"
    fi
done <"$csvfile"

Notes on changes unrelated to the specific fix:

  • Replacing echo -e with printf lets us pick which specific variables get escape sequences expanded: %s for filenames means backslashes and other escapes in them are unmodified, whereas %b for $red and $color_Off ensures that we do process highlighting for them.
  • Replacing cat $csvfile | with < "$csvfile" avoids the overhead of starting up a separate cat process, and ensures that your while read loop is run in the same shell as the rest of your script rather than a subshell (which may or may not be an issue for zsh, but is a problem with bash when run without the non-default lastpipe flag).
  • echo -n isn't reliable as a noop: some shells print -n as output, and the POSIX echo standard, by marking behavior when -n is present as undefined, permits this. If you need a noop, : or true is a better choice; but in this case we can just invert the test and move the else path into the truth path.
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441