4

This is an URL that points to an image:

https://www.somesite.com/some/dir/jsp/data.jsp?KEY=12155&TYPE=jpg&qi=R7SWtM5F5PL4cDDFfdfpIrqIWSY3gr2XGQg=

I get the image if I use cURL as this command in CLI:

/usr/bin/curl -o 1234.jpg 'the_url_to_image'

I need to use cURL in PHP with arguments. I tried several parameters to get the image, and I always get a 403 error

Access to the specified resource has been forbidden. Apache Tomcat

My parameters (only the cURL parameters, the code for writting the image to file is not here):

 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, $img_url);
 curl_setopt($ch, CURLOPT_VERBOSE, 1);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_AUTOREFERER, false);
 curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
 curl_setopt($ch, CURLOPT_HEADER, 0);
 curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36');
 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
 curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
 curl_setopt($ch, CURLOPT_COOKIEJAR,$cookie_filename);
 curl_setopt($ch, CURLOPT_COOKIEFILE,$cookie_filename);
 $page_content  = curl_exec($ch);
 curl_close($ch);

EDIT

If I feed the image URL to this page:

onlinecurl.com

I get the image binary back, and no the error message.

So the image can be saved with cURL, I only need to get the curl_setopt settings right.

EDIT

By running the a command in CLI the image is saved to the local path

/usr/bin/curl -o 1234.jpg 'the_url_to_image'

When running the same command with

shell_exec("/usr/bin/curl -o 1234.jpg 'the_url_to_image'")

The error message is saved in the 1234.jpg file.

What can be the difference in the command line and code execution of the same command?

Timotej Leginus
  • 304
  • 3
  • 18
Szekelygobe
  • 2,309
  • 1
  • 19
  • 25

4 Answers4

1

What can be the difference in the command line and code execution of the same command ?

your user-agent isn't even close:

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36');

try setting it to a real curl-cli useragent, like

curl_setopt($ch,CURLOPT_USERAGENT,'curl/7.63.0');

or

curl_setopt($ch,CURLOPT_USERAGENT,'curl/'.(curl_version()['version']));

it's rare, but it's possible (and even likely given the evidence) that they're using a user-agent whitelist, and Google Chrome (or is it Safari?) is not on their whitelist, but curl-cli is...

another possible explanation is that they're trying to detect and block people lying on their user-agent, and it's easy to detect that you're lying: you're (falsely) saying that you are Safari or Chrome, and both of those always sends Acccept-Encoding: gzip/deflate/whatever, but your curl request does not (because you didn't use CURLOPT_ENCODING), thus it's easy to detect that your user-agent is fake, maybe that's what's causing the block. either way, try using a real curl user-agent.

hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • I tried to set the `CURLOPT_USERAGENT` to curl or to Mozzila and to set the `CURLOPT_ENCODING` but still not working . Probably they have some protection for parsing, but the command line version is working, so there must be some setting I'm missing. – Szekelygobe Dec 28 '19 at 18:23
  • @Szekelygobe don't set it to mozilla, set it to `curl/7.63.0` – hanshenrik Dec 28 '19 at 21:47
  • @Szekelygobe ... in that case, set up a netcat server like explained in https://stackoverflow.com/a/55829622/1067003 and compare the difference between the curl cli program and the php script, what are their differences? – hanshenrik Dec 29 '19 at 11:50
  • I tried to execute the `/usr/bin/curl -o 1234.jpg 'the_url_to_image'` command with `shell_exec()`, failed to save the image, it returned the same error. – Szekelygobe Dec 30 '19 at 00:35
  • I tested the browser request with `netcat` and restructured the `cURL` request to exactly mimic the browser's request, but still no success... – Szekelygobe Jan 06 '20 at 13:17
1

401 is Unauthorized

403 is Forbidden

These are badly described.

401 really means not Authenticated

403 really means not Authorized

If this is indeed a protected resource that requires being logged in to fetch it, then this means that yes, the server recognises you (you didn't get a 401), but you don't have the required permissions (403).

If, on the other, hand the image really is public, actually pasting the link could help us to help you.

delboy1978uk
  • 12,118
  • 2
  • 21
  • 39
1

As it turns out the problem was a simple one.

-The first clue was that the command in terminal was working but the same command with shell_exec() was returning an error.

-The second clue was that as delboy1978uk mentioned the error was not 401 not authenticated but a 403 non authorized.

So there had to be a problem with the URL or parameter. I printed out the URL but found no error.... So long story short, the problem was with the special characters in the URL. When I printed the URL the browser displayed the & character correctly not as the function got it as a parameter &.

So if I feed URL to htmlspecialchars_decode() prior to run the command then it works flawlessly.

So lookout for special characters in the URL!

Szekelygobe
  • 2,309
  • 1
  • 19
  • 25
0

You can try adding

curl_setopt($ch, CURLOPT_POST, 0);

If this doesn't work you have to add the Apache Tomcat in your web.xml:

<login-config>
  <auth-method>BASIC</auth-method>
</login-config>
HP371
  • 860
  • 11
  • 24