0

I am using php curl to check if a user-supplied URL is valid. See this question for specific/gory details. That question got an answer that I accepted - that the error-checking was way too complex and I should just use file_get_contents() instead of curl. That works for the sole purpose of a good/bad determination, but it doesn't provide useful information to the user, like whether the domain doesn't exist, the resource was not found, needs authorization, etc. So I intend to post a different curl-based answer and remove the accepted status from the one suggesting file_get_contents().

But then I came across the following problem with a URL for a site that I own: curl rejected it with the following error recorded in the verbose log:

* http2 error: Invalid HTTP header field was received: frame type: 1, stream: 1, name: [upgrade], value: [h2,h2c]

Tracking that down, it apparently is triggered because the http2 RFC forbids connection-specific headers in responses. I can "fix" it by adding proxy_hide_header: Upgrade; to the site's nginx configuration. But the site works fine in web browsers, and what I'm trying to do is determine whether or not a URL will work in a web browser.

I suppose I could write even more code to save the verbose messages from the curl request and scan them for http2 errors, but that seems insanely complicated. What can I do on the curl end to prevent having the URL rejected for this reason?

sootsnoot
  • 2,178
  • 3
  • 22
  • 27
  • Side note: `I should just use file_get_contents() instead of curl.` no you should not and complexity is not an excuse. using file_get_contents() for networking is asking for troubles, not only because the http transport can be disabled in config, but also because you got very little control there comparing to cURL (or its wrappers). Also suspect you fetch the remote resource as whole, instead of doing `HEAD` request, correct? – Marcin Orlowski May 16 '23 at 18:51
  • @Marcin Orlowski Thanks for your comment, I was getting worried that nobody cared... Looks like you didn't read the question I cited or the answer it got suggesting the use of file_get_contents(). In fact, the code I posted with the question using curl *did* use a HEAD request. because I didn't care about the body. But in testing, I found that the HEAD request gave me a 404 for a URL my users were likely to use, but which worked fine in a browser (or with file_get_contents()). Part of my question was: – sootsnoot May 17 '23 at 17:02
  • "How can this happen? It seems to me that amazon s3 would have to look for HEAD requests explicitly, and deliberately issue a 404 or 403 depending on whether it came via a redirect??? I suppose I could delete the CURLOPT_NOBODY to have it send a GET request, but that seems silly since I don't care about the body." That part of the question got no attention in the answer other than "You can't expect that every URL will be successfully reachable via a HEAD request, or that the results of HEAD request will always be the same as the results of a GET request." – sootsnoot May 17 '23 at 17:04
  • 1
    I will take another look at your question then. – Marcin Orlowski May 17 '23 at 17:49

0 Answers0