64

I am trying to make a program that can convert a series of manga scans into one pdf file, and I don't want to have to attempt to download the picture to determine if I have the right url. Is there a shell scripting command that I can use to just check if a web page exists?

SidJ
  • 669
  • 12
  • 29
Branden
  • 743
  • 1
  • 6
  • 5

6 Answers6

115

Under a *NIX, you can use curl to issue a simple HEAD request (HEAD only asks for the headers, not the page body):

curl --head http://myurl/

Then you can take only the first line, which contains the HTTP status code (200 OK, 404 Not Found, etc.):

curl -s --head http://myurl/ | head -n 1

And then check if you got a decent response (status code is 200 or 3**):

curl -s --head http://myurl/ | head -n 1 | grep "HTTP/1.[01] [23].."

This will output the first line if the status code is okay, or nothing if it isn't. You can also pipe that to /dev/null to get no output, and use $? to determine if it worked or no:

curl -s --head http://myurl/ | head -n 1 | grep "HTTP/1.[01] [23].." > /dev/null
# on success (page exists), $? will be 0; on failure (page does not exist or
# is unreachable), $? will be 1

EDIT -s simply tells curl to not show a "progress bar".

zneak
  • 134,922
  • 42
  • 253
  • 328
  • 5
    To do this with less string parsing, and to check for redirects to non-existing pages use this `curl --silent --head --location --output /dev/null --write-out '%{http_code}' http://en.wikipedia.org/wiki/tla | grep '^2'` – bukzor Aug 16 '13 at 16:39
  • Script to automate the validation for a number of files: https://gist.github.com/igilham/12eb33ab8a86f1e815d2 – Ian Gilham Sep 18 '14 at 16:05
  • 3
    From my experience I know that it's worth to add `--connect-timeout ` option. – patryk.beza Jul 24 '15 at 23:34
  • For `curl -s --head http://myurl/ | head -n 1 | grep "HTTP/1.[01] [23].." > /dev/null` and then `echo $?`, I am always getting 0, whether or not myurl exists. Am I doing something wrong? – thepiercingarrow Mar 25 '16 at 00:08
  • 1
    @MarkWright, let me check when I get home to a UNIX box – zneak Mar 25 '16 at 00:16
  • 1
    @MarkWright, I can reproduce your problem neither OS X nor Ubuntu 15.04. Do you have more context? – zneak Mar 26 '16 at 02:14
  • Ah okay, it is working now, thanks. Not sure what happened earlier. – thepiercingarrow Mar 26 '16 at 21:21
26

Use cURL to obtain the status code and check for required values.

status=$(curl -s --head -w %{http_code} http://www.google.com/ -o /dev/null)
echo $status
Sithsu
  • 2,209
  • 2
  • 21
  • 28
15

First make sure there is no authorization issue.If any Authorization Required , you provide the username and password .Create a shell script file (checkURL.sh ) and paste the below code.

Hope this will help you.

checkURL.sh

yourURL="http://abc-repo.mycorp.com/data/yourdir"

if curl --output /dev/null --silent --head --fail "$yourURL"
then
    echo "This URL Exist"
else
    echo "This URL Not Exist"
fi

Its working for me in Nexus and other Repository.

JDGuide
  • 6,239
  • 12
  • 46
  • 64
4

You can always just use wget; I do as the code is simpler.

 if [[ $(wget http://url/ -O-) ]] 2>/dev/null
  then echo "This page exists."
  else echo "This page does not exist."
 fi

Using the -O- option with wget means that it will try to output the contents of the page, but only if it exists. So if there isn't any output, then the page doesn't exist. The 2>/dev/null is just to send the output (if there is any) to the trash.

I know it's overdue, but I hope this helps.

Alek
  • 307
  • 2
  • 7
0

Wget has an effective feature for this purpose, using its --spider argument. If the web-page is found, the return code is 0. For another errors, the return code is greater than 0.

For example:

URL="http://www.google.com"

if wget --spider "$URL" 2>/dev/null; then
    echo "$URL web-page exists !"
else
    echo "$URL web-page does NOT exists !"
fi
s3n0
  • 596
  • 3
  • 14
-1

wget or cURL will do the job. See here wget or cURL for details and download locations. Supply the URL to these command-line tools and check the response.

Jochem Schulenklopper
  • 6,452
  • 4
  • 44
  • 62
  • Pfff... why the downvotes (without explanation, mind you) if the answer is correct, brief and informative? – Jochem Schulenklopper Mar 02 '16 at 12:03
  • 2
    I didn't down vote, but if I had to guess, it's because URL-only answers are [discouraged](https://meta.stackexchange.com/questions/8231/are-answers-that-just-contain-links-elsewhere-really-good-answers). – zneak Jul 20 '17 at 00:08
  • Thanks. TBH, the question was "Is there a shell scripting command that I can use to just check if a web page exists?" and my answer was "wget or cURL will do the job", plus links to each commands and an explanation that the response of those command invocations could be checked. That's not an URL-only answer, by any measure, and it was as informative as the other answer (revision) that was out at that time: https://stackoverflow.com/revisions/2924444/1. Alas, apparently a bad karma day :-) – Jochem Schulenklopper Jul 22 '17 at 13:39
  • 1
    Yes; with that said, it's common practice to post minimal answers very quickly and expand on them with edits within 5-10 minutes. In fact, revision 2 of that post, which is essentially complete (rev 3 adds a single explanation line about the -s switch), was submitted 6 minutes after the original. And of course, when people find questions on search engines later, they don't have that kind of revision sensitivity. :) – zneak Jul 22 '17 at 16:24