2

So currently I am working with a REST API provided by a third party tool that requires me to make thousands of requests per run of my script.

For the most part, everything works well the only issue is that it takes some time. Now that the logic is finished and complete, I am looking to improve the performance of the script by playing with the CURL requests.

Some notes:

  • Using a third party app (like POSTMAN) I get a faster response on average per request ~ 600ms(POSTMAN) vs 1300ms(PHP Curl). I was kind of able to achieve this rate which means I think I have the best optimization I can get

  • I am currently using curl_multi in other parts of my script, but the part I am currently targeting has one CURL request depend on the return value of another.

  • These are all GET requests, I have POST,PUT,DELETE, and PATCH, but those are used rather sparingly so the area I am targeting are the linear GET requests.

I have done some research and for the most part everyone recommends using curl_mutli as the default, but I can't really do that as the requests are chained. I was looking at the PHP documentation and I thought about going past my basic GET request and adding some more options.


Original Code

So below is my first simplistic take, creates a new curl object, sets the request type, transfer, header, credentials, and SSL verification (Needed this to bypass the third part tool moving to a cloud instance). For the specific query I was testing this ran at about ~1460ms

(From test code below = 73sec/50runs = 1.46seconds)

 function executeGET($getUrl)
{
    /* Curl Options */
    $ch = curl_init($this->URL . $getUrl);
    curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
    curl_setopt($ch, CURLOPT_USERPWD, /*INSERT CREDENTIALS*/);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    /* Result handling and processing */
    $result = curl_exec($ch);

    return $result;
}

First Attempt at optimization

Next I started going through already posted stack overflow questions and the PHP documentation to look at different CURL options to see what would affect the request. The first thing I found was the IP resolve, I forced that to default to IPV4 and that sped it up by ~ 100ms. The big one however was the encoding which sped it up by about ~300ms.

(From test code below = 57sec/50runs = 1.14seconds)

    function executeGET($getUrl)
{
    /* Curl Options */
    $ch = curl_init($this->URL . $getUrl);
    curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
    curl_setopt($ch, CURLOPT_USERPWD, /*INSERT CREDENTIALS*/);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 ); //NEW
    curl_setopt($ch, CURLOPT_ENCODING, ''); //NEW

    /* Result handling and processing */
    $result = curl_exec($ch);

    return $result;
}

Re-using same curl object

The last part I decided to try was using a single curl object but just changing the URL before each request. My logic was that all the options were the same I was just querying a different endpoint. So I merged the newly discovered options for IPRESOLVE and the ENCODING along with the re-use and this is what I got in its most striped down version.

(From test code below = 32sec/50runs = 0.64seconds)

    private $ch = null;
function executeREUSEGET($getUrl)
{
    /* Curl Options */
    if ($this->ch == null) {
        $this->ch = curl_init();
        curl_setopt($this->ch, CURLOPT_CUSTOMREQUEST, "GET");
        curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($this->ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
        curl_setopt($this->ch, CURLOPT_USERPWD, /*INSERT CREDENTIALS*/);
        curl_setopt($this->ch, CURLOPT_SSL_VERIFYPEER, false); //Needed to bypass SSL for multi calls
        curl_setopt($this->ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 ); //NEW
        curl_setopt($this->ch, CURLOPT_ENCODING, '');
    }
    curl_setopt($this->ch, CURLOPT_URL, $this->URL . $getUrl);

    /* Result handling and processing */
    $result = curl_exec($this->ch);
    return $result;
}

Testing Code

Reuse GET taken in seconds is: 32 
Normal GET(with extra options) taken in seconds is: 58
Normal GET taken in seconds is: 73

$common = new Common();
$startTime = time();
for ($x = 0; $x < 50; $x++){
    $r = $common->executeREUSEGET('MY_COMPLEX_QUERY');
}
echo 'Time taken in seconds is: '.(time()-$startTime);
$startTime = time();
for ($x = 0; $x < 50; $x++){
    $r = $common->executeGET('MY_COMPLEX_QUERY');
}
echo 'Reuse taken in seconds is: '.(time()-$startTime);

Conclusion

This was my thought process when going my code and trying to speed up my request to more closely match what I was receiving using POSTMAN. I was hoping to get some feedback on what I can improve on to speed up these specific GET requests or if there is even anything else I can do?

EDIT: I didn't really do much testing with the curl_multi in terms of optimization, but now that I have my default GET request down under a second, would it be better to convert those curl_multi into my executeReuseGet? The difference is when I use curl_multi I just need the data, none of it depends on previous input like my executeGets.

97WaterPolo
  • 375
  • 1
  • 3
  • 24
  • If you must make "thousands of requests per run" of your script, then this is a textbook case where you should be using multithreading/multiprocessing. The primary limitation is that curl requests involve a lot of latency -- i.e., your computer is waiting around for the remote machine to respond. The easiest way to accomplish this would be to look at [curl_multi](https://www.php.net/manual/en/function.curl-multi-init.php). – S. Imp Aug 07 '19 at 17:17
  • @S.Imp I understand that, and I do use curl_multi at times where I can. When I just need data to be extracted where none of the calls rely on one another. But in the cases where I am using my singular requests, it is because the requests are chained upon another. – 97WaterPolo Aug 07 '19 at 17:19
  • Would this not be better to run async? I am assuming this is part of a web based application, can it not be done in Ajax (node.js)? Load the minimal then load it async, it may take delays to load messages etc or whatever it is you stream data for, but it would be better to load the infrastructure then await data with a nice overlay on the specific div fields – Jaquarh Aug 07 '19 at 17:26
  • @Jaquarh Thank you for your feedback! So at the time I started this project my web-based knowledge was pretty much raw HTML. I picked up PHP as it seemed the simplest to start with REST calls as it had the CURL library built in. This isn't really meant for a front-end type use, it has a very simple UI to execute these scripts but what they are first and foremost, is just data generation and extraction. I have about 15 different projects which all utilize these calls, and about 13 of them write the contents to a log file. Technically the user doesn't even need/want to see it until it is done! – 97WaterPolo Aug 07 '19 at 17:30
  • Optimization with REST is a hit and miss, HTTP requests can vary in timescale for each individual request. If user interaction is not needed then perhaps optimization to the level you're expecting isn't necessary? If its huge, perhaps consider load balancing – Jaquarh Aug 07 '19 at 17:33
  • @Jaquarh I kinda figured that it was a hit or miss just because it has a lot of factors to depend on. User interaction is non existent after the report is started but I would like to try and get these reports out in a reasonable amount of time which is what I am trying to target. Could you elaborate a bit into what you meant by load-balancing. IIRC that is distributing the work across multiple machines? I don't believe it is a CPU/Memory throttle but rather a request/response throttle. – 97WaterPolo Aug 07 '19 at 17:41
  • @97WaterPolo if each request depends on the result of the prior request, then you are probably not going to be able to get this script to run quickly. An internet request generally involves several stages: DNS lookup, socket connection, request transmission, wait for response, request receipt, socket close, etc. If you have any control over the remote server to which you are sending requests, I'd create a script over on that machine that can receive a batch file or something. Otherwise, expect at least a few hundred milliseconds per request. – S. Imp Aug 07 '19 at 18:08
  • @S.Imp Thank you! I kinda figured I came to the best solution I could possibly do from a remote access perspective.Unfortunately I don't have any control/access to the remote server all I have to access the data is the REST API calls. Was hoping I overlooked something but that doesn't seem to be the case, thank you! – 97WaterPolo Aug 07 '19 at 18:14
  • @97WaterPolo reusing an existing curl connection sounds to me like the biggest improvement you can make. you might want to do whatever you can to make sure the connection, once made, is kept alive as long as possible so you can make multiple requests while the socket connection is open. there's some useful info [here](https://stackoverflow.com/questions/972925/persistent-keepalive-http-with-the-php-curl-library) but this will also depend on the remote server's KEEP-ALIVE setting. – S. Imp Aug 08 '19 at 16:41

0 Answers0