0

I have following bash script which is supposed to download the current Wikipedia ZIM file if the file size differs:

#!/bin/bash

wikipedia_current_filesize=$(stat -c %s wikipedia.zim)
wikipedia_download_filesize=$(curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim | gawk -v IGNORECASE=1 '/^Content-Length/ { print $2 }')

echo "Wikipedia filesize [current / download]:"
echo "$wikipedia_current_filesize / $wikipedia_download_filesize"

if [ "$wikipedia_current_filesize" != "$wikipedia_download_filesize" ]
then
  echo "Downloading newer version of Wikipedia..."
else
  echo "No new version for Wikipedia available."
fi

The output is:

Wikipedia filesize [current / download]:
38095908569 / 38095908569
Downloading newer version of Wikipedia...

The numbers are exactly the same. Why do I still get into the if and not into the else branch here? Am I comparing strings the wrong way here? Is there maybe a more meaningful way, e.g. by comparing integers instead of strings?

tai
  • 477
  • 1
  • 5
  • 16
  • 4
    Most likely `wikipedia_download_filesize` contains "hidden" characters, probably a carriage return. Do `printf "%q\n" "$wikipedia_download_filesize"` and see what that variable contains – glenn jackman Jun 22 '22 at 21:32
  • 2
    Or run `bash -x yourscript`, or add the line `set -x` -- either of those will enable tracing, making the problem obvious. – Charles Duffy Jun 22 '22 at 21:41
  • 3
    for numerical comparisons using `test` (aka `[`), you should use `-ne` instead of `!=`. If the operands are not numbers, it will print an error saying `integer expression expected`. – Costi Ciudatu Jun 22 '22 at 21:42
  • There's lots more info & options for dealing with carriage returns at: ["Are shell scripts sensitive to encoding and line endings?"](https://stackoverflow.com/questions/39527571/are-shell-scripts-sensitive-to-encoding-and-line-endings) – Gordon Davisson Jun 22 '22 at 22:36

1 Answers1

2

HTTP responses use \r\n line endings.

gawk's default record separator is newline, which leaves the carriage return as a plain character in the last field. It can remove the trailing carriage return.

wikipedia_download_filesize=$(
    curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim \
    | gawk -v IGNORECASE=1 '/^Content-Length/ { print gensub(/\r$/, "", 1, $2) }'
)

Or, more awk-ishly

wikipedia_download_filesize=$(
    curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim \
    | gawk -v IGNORECASE=1 -v RS='\r\n' '/^Content-Length/ { print $2 }'
)
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • oh geez, thanks a lot for that. I would have never suspected Windows-like line endings to be a problem here. – tai Jun 23 '22 at 17:45