I'm trying to create a more effective "check if URL exist" function and I'm almost done the only roadblock is the regex.
So I'm looking for a regex that will match any first character of an output then print it and exit for example the bellow code gets the source code of the youtube page and as soon as the output reaches the title tags it matches them and it kills the wget commands
Idea borrowed from here
https://unix.stackexchange.com/questions/103252/how-do-i-get-a-websites-title-using-command-line
Performance/Efficiency
Here, out of laziness, we have perl read the whole content in memory before starting to look for the tag. Given that the title is found in the section that is in the first few bytes of the file, that's not optimal. A better approach, if GNU awk is available on your system could be:
wget -qO- 'http://www.youtube.com/watch?v=Dd7dQh8u4Hc' | \ gawk -v IGNORECASE=1 -v RS='</title' 'RT{gsub(/.*<title[^>]*>/,"");print;exit}'
That way, awk stops reading after the first
My logic is this: if the URL exist it will output source and I don't want to waste time by downloading the entire source code thus on the first character of source code output, print it and exit.
then I will store the output of wget and gawk
first_character_of_source_code=$(wget|awk magic)
if [[ $first_character_of_source_code != '' ]]; then
echo "URL exists!"
else
echo "URL doesn't exist!"
fi
Also for my "check if URL exist" function I've tried this How do I determine if a web page exists with shell scripting? the curl solution suggested in the answers is mostly ok but website like Quora return 403 Forbidden, and yes I've added user agent but the wget plus gawk solution return source code which is better for determining if the URL exists.