You can add a line to extract the meta header for robots from the source code of the page and modify the line with echo to show its value:
#!/bin/bash
while read url
do
dt=$(date '+%H:%M:%S');
urlstatus=$(curl -kH 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code} %{redirect_url}' "$url" )
metarobotsheader=$(curl -kH 'Cache-Control: no-cache' --silent "$url" | grep -P -i "<meta.+robots" )
echo "$url $urlstatus $dt $metarobotsheader" >> urlstatus.txt
done < $1
This example records the original line with the meta header for robots.
If you want to put a mark "-" when the page has no meta header for robots, you can change the metarobotsheader
line, and put this one:
metarobotsheader=$(curl -kH 'Cache-Control: no-cache' --silent "$url" | grep -P -i "<meta.+robots" || echo "-")
If you want to extract the exact value of the attribute, you can change that line:
metarobotsheader="$(curl -kH 'Cache-Control: no-cache' --silent "$url" | grep -P -i "<meta.+robots" | perl -e '$line = <STDIN>; if ( $line =~ m#content=[\x27"]?(\w+)[\x27"]?#i) { print "$1"; } else {print "no_meta_robots";}')"
When the URL doesn't contain any meta header for robots, it will show no_meta_robots.