I used the curl command from here How to check if an URL exists with the shell and probably curl?
but it doesn't work if the website has a generic error page like "Sorry, we are unable to find that page".
How to detect such pages automatically?
Test URL
http://www.nytimes.com/2013/09/18/us/washington-navy-yard-shootings.html
^ page exists
http://www.nytimes.com/2013/09/18/us/washington-navy-yard.html
^ page does not exist