-3

Hopefully a quick one, I have a task to see if URLs are still valid on our TFS server as I'm doing a bit of housekeeping.

Currently, I have a list of URLs that I need to check, I was using CURL. The problem is that I have 1000's of URLs to check and everyone automatically downloads a file.

Is there a way to "Fake" the download? What I mean by this is, is there a way that I can confirm the URL works without actually downloading the file as there is 1000's of URLs which will take a lot of time to go through as well as taking up HDD space if I was to download them all?

Thanks in advance :)


Update

TFS is Team Foundation Server,

So here's my current code as a test;

curl -k -u $userPass $url --output test.zip

This code ^^^ successfully downloads the file I'm after but as soon as I add "-v" to get the headers It corrupts the download and gives me a 405 response code.

Matt Taylor
  • 189
  • 1
  • 2
  • 10
  • 1
    1) What is TFS please? 2) Did you look at the manage for `curl` to see if there was a *"header only"* option? 3) Have you tried **GNU Parallel** to get stuff done faster? 4) What is your current code please? – Mark Setchell Dec 23 '19 at 11:10
  • TFS is Team Foundation Server, – Matt Taylor Dec 23 '19 at 11:12

1 Answers1

4

issue HTTP HEAD requests and only download the headers so you can check if it returns "HTTP 404 Not Found" or something else, you can do that with curl using the -I parameter, but with large lists, you shouldn't be using the cli program curl, you should be using the libcurl curl_multi API, which can check hundreds, or even thousands of urls concurrently using async connections, that will be much faster than anything you can do from the cli program. this code use the curl_multi API to check large lists of URLs using PHP: https://stackoverflow.com/a/54353191/1067003

putting that code in a .php script and running it from php-cli should be much faster than anything you can achieve from the cli program curl.

and if that's still too slow for you, you can rewrite it in C/C++ using the curl_multi C API, which would run even faster than the PHP implementation above~ (PHP use significantly more CPU than a C implementation would use.. one of the downsides of using interpreted languages. still, your bottleneck is probably bandwidth at this point, not CPU)

hanshenrik
  • 19,904
  • 4
  • 43
  • 89