0

I used the curl command from here How to check if an URL exists with the shell and probably curl?

but it doesn't work if the website has a generic error page like "Sorry, we are unable to find that page".

How to detect such pages automatically?

Test URL

http://www.nytimes.com/2013/09/18/us/washington-navy-yard-shootings.html

^ page exists

http://www.nytimes.com/2013/09/18/us/washington-navy-yard.html

^ page does not exist

Community
  • 1
  • 1
user13107
  • 3,239
  • 4
  • 34
  • 54

1 Answers1

1

To check is the page is valid:

curl -s --head http://your_url/ | head -n 1 | grep 200

or you can grep 404 to check if "page doesn't exist"

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
  • Thanks but it doesn't always work. Please see the test URL in edited question. – user13107 Sep 18 '13 at 07:57
  • @user13107 not true, if you'll try to "grep" 200 on that second URL you posted - you'll fail. You're actually getting HTTP 303 (see other) for that URL! stick to "grep 200" and you'll get a result returned for valid URLs only – Nir Alfasi Sep 18 '13 at 08:05
  • Hi, I'm getting no output for either pages when I do `grep 200`. Did you get different output for them both? – user13107 Sep 18 '13 at 08:07