0

The code below works fine on Ubuntu 20.04. It checks the .csv file which contains URLs in column A. Every single address URL is in a new row.

To use it you need to run the script by typing:

bash script.sh file_with_urls.csv response_code

for example: bash script.sh urls-to-check.csv 200

#!/usr/bin/env bash
while read -r link; do
    response=$(curl --output /dev/null --write-out %{http_code} "$link")
    if [[ "$response" == "$2" ]]; then
        echo "$link"
    fi
done < "$1"

If I use it on Windows 10 with WSL Ubuntu 20.04 distribution I'm getting "curl: (3) URL using bad/illegal format or missing URL" error.

I'm a bit stuck with this...

blurfus
  • 13,485
  • 8
  • 55
  • 61
luknij
  • 136
  • 1
  • 14
  • You need to figure out a way to find out which URL (from the file) is failing. Either `echo` them before you invoke the `curl` command or print them out to a file after a successful call... Once you have the URL/culprit, then you can see what's wrong with it (to see if it's missing something or it's illegal in some way). Without any additional information, there is no easy way for us to help you other than by guessing – blurfus Nov 14 '21 at 21:42
  • 2
    `read -r link` is reading the entire line (not just the first field) into `link`. See [BashFAQ #1: "How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?"](http://mywiki.wooledge.org/BashFAQ/001) The CSV file might also have [DOS/Windows line endings](https://stackoverflow.com/questions/39527571/are-shell-scripts-sensitive-to-encoding-and-line-endings), which adds another pile of potential confusion. Adding `set -x` as the second line of the script (just after the shebang) will print an execution trace that'll help show problems like this. – Gordon Davisson Nov 14 '21 at 21:45
  • I dont understand everything in the debug mode but the second line at the end of the URL address has \r. Think this is the case... `+ read link ++ curl --output /dev/null --silent --write-out '%{http_code}' {full_url_here}/\r' + response=000 [[ 000 == \4\0\4 ]]` When I do sed like dan shows the script work normally. Appreciate that you pointed the sources so I can understand what exacly happen and why – luknij Nov 15 '21 at 17:49
  • blueface, thank you for the answer. I understand what you said but I don`t know how to achieve this... This is because my skills are to low to execute this... – luknij Nov 15 '21 at 18:33

1 Answers1

2

It's probably line endings:

#!/usr/bin/env bash

while IFS=, read -ra link; do
    response=$(curl --output /dev/null --write-out %{http_code} "${link[0]}")
    if [[ "$response" == "$2" ]]; then
       echo "${link[0]}"
    fi
done < <(sed 's/\r$//' "$1")

You can also do dos2unix urls_to_check.csv to convert it. If you open it in Windows, it could get converted back.

Alternatively, invoke it like this:

bash script.sh <(sed 's/\r$//' file_with_urls.csv) response_code
dan
  • 4,846
  • 6
  • 15
  • Your answer is a perfect complement to what Gordon Davisson wrote, and both are excellent study material! The script works when I use sed at the end. Why you use "${link[0]}" instead of just "$link"? – luknij Nov 15 '21 at 19:05
  • 1
    @luknij you said it was a csv file (comma separated variable). If there was `column-A,column-B,column-C`, `IFS=, read-ra` splits each line in to a bash array, so`${link[0]}` is column-A, `${link[1]}` is column-B, `${link[2]}` is column-C, etc. If there's only one column, you can just use `read -r link`. In fact, instead of `sed`, you could also use `curl ... ${link%$'\r'}` to remove the carriage returns. You can also remove CRs permanently with `dos2unix file.csv`. But, if you open the file in Windows, they will probably come back. – dan Nov 16 '21 at 00:21