3

My current code (see below) uses 147MB of virtual memory! My provider has allocated 100MB by default and the process is killed once run, causing an internal error. The code is utilising curl multi and must be able to loop with more than 150 iterations whilst still minimizing the virtual memory. The code below is only set at 150 iterations and still causes the internal server error. At 90 iterations the issue does not occur.

How can I adjust my code to lower the resource use / virtual memory?

Thanks!

<?php

    function udate($format, $utimestamp = null) {
      if ($utimestamp === null)
        $utimestamp = microtime(true);
      $timestamp = floor($utimestamp);
      $milliseconds = round(($utimestamp - $timestamp) * 1000);
      return date(preg_replace('`(?<!\\\\)u`', $milliseconds, $format), $timestamp);
    }

$url = 'https://www.testdomain.com/';
$curl_arr = array();
$master = curl_multi_init();

for($i=0; $i<150; $i++)
{
    $curl_arr[$i] = curl_init();
    curl_setopt($curl_arr[$i], CURLOPT_URL, $url);
    curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl_arr[$i], CURLOPT_SSL_VERIFYHOST, FALSE);
    curl_setopt($curl_arr[$i], CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_multi_add_handle($master, $curl_arr[$i]);
}

do {
    curl_multi_exec($master,$running);
} while($running > 0);

for($i=0; $i<150; $i++)
{
    $results = curl_multi_getcontent ($curl_arr[$i]);
    $results = explode("<br>", $results);
      echo $results[0];
      echo "<br>";
      echo $results[1];
      echo "<br>";
      echo udate('H:i:s:u');
      echo "<br><br>";
      usleep(100000);
}

?>
Ryan Bigg
  • 106,965
  • 23
  • 235
  • 261
iCeR
  • 77
  • 1
  • 6
  • @dqhendricks: "My provider has allocated 100MB by default" == shared hosting. – thirtydot Dec 31 '10 at 01:32
  • @dqhendricks, @thirtydot: Using WHM on Linux. I have updated my question with the server info. I don't get what you mean by running it by command line rather than apache? Sorry.. – iCeR Dec 31 '10 at 01:42
  • @iCeR: How many bytes is the test page you're testing with? – thirtydot Dec 31 '10 at 01:43
  • Does your provider allow you to run binaries? Doing this with native code could save a lot on memory consumption... – Billy ONeal Dec 31 '10 at 01:43
  • @thirtydot: 814 bytes (4,096 on disk) – iCeR Dec 31 '10 at 01:47
  • @Billy ONeal: I have no idea. I can ask. How would I be able to run this with native code? Example would be great so if binary is able to be run I'll go ahead and implement. – iCeR Dec 31 '10 at 01:48
  • @iCeR: Well, you'd basically be implementing a PHP extension (that is, in C). You run the cURL calls in native land (you expose a new function into PHP) and then expose them in a way for the rest of your (PHP) code to talk with. Of course this requires that A. you can install extensions, and B. that you know C. – Billy ONeal Dec 31 '10 at 01:53
  • @Billy ONeal: A. Yes and B. No :) – iCeR Dec 31 '10 at 01:57
  • @iCeR: From your comment on @Steve-o answer, it seems like you want to check the same URL 150 times with a 0.1 second delay between fetches. Your current code is not doing this - it is fetching the same URL 150 times at the **exact same time**, then waiting 0.1 sec (per fetch) before outputting the results. Can you clarify what you're trying to do here? – thirtydot Dec 31 '10 at 01:58
  • Have you tried freeing CURL handles from `curl_multi_getcontent`? Read the comments in this article: http://www.rustyrazorblade.com/2008/02/curl_multi_exec/ – Steve-o Dec 31 '10 at 02:00
  • @thirtydot: Sorry, maybe I have coded this all wrong for the examples in past. I would like to fetch the same URL and output - both as quick as possible. If I was to fetch a URL and output, 150 times, it would slow down the process as it would need to run curl, wasting approx 500ms. I'm trying to achieve the quickest possible way to continuously check an API for a response by feeding it ONE url. Hope this makes sense. – iCeR Dec 31 '10 at 02:18
  • @Steve-o: So basically to add: [code]curl_multi_remove_handle($master, $curl_arr[$i]);[/code] before usleep? Still the same issue.. – iCeR Dec 31 '10 at 02:20
  • @iCeR: Few questions, again. 1) Are you sure that fetching the same URL 150 times with 0.1 second delay inbetween is what you want to do? It seems too frequent. 2) Can you use the [`socket_create`](http://php.net/manual/en/function.socket-create.php) family of functions on your server? 3) The reason I'm asking question 1) is that it could completely change the code you would use, fetching every 0.1 sec vs, for example, every second. 4) How long does it take for your API to return a result? – thirtydot Dec 31 '10 at 02:31
  • @thirtydot: Not a problem, thanks for the help. 1) Yes, 100% :), I am checking domain availability for a client, for specific domains. 2) I can indeed. 3) Would love to fetch every 0.1s or less! 4) extremely quick! Unsure of exact times though. – iCeR Dec 31 '10 at 02:36
  • Now, I'm not 100% sure I agree with your intentions, but something like this should be programed in C/C++ and not a scripting language. You should get a huge speed boost and a lower memory footprint. – Natalie Adams Jan 03 '11 at 01:14

3 Answers3

2

As per your last comment..

Download RollingCurl.php.

Hopefully this will sufficiently spam the living daylights out of your API.

<?php

$url = '________';
$fetch_count = 150;
$window_size = 5;


require("RollingCurl.php");

function request_callback($response, $info, $request) {
    list($result0, $result1) = explode("<br>", $response);
    echo "{$result0}<br>{$result1}<br>";
    //print_r($info);
    //print_r($request);
    echo "<hr>";
}


$urls = array_fill(0, $fetch_count, $url);

$rc = new RollingCurl("request_callback");
$rc->window_size = $window_size;
foreach ($urls as $url) {
    $request = new RollingCurlRequest($url);
    $rc->add($request);
}
$rc->execute();

?>

Looking through your questions, I saw this comment:

If the intention is domain snatching, then using one of the established services is a better option. Your script implementation is hardly as important as the actual connection and latency.

I agree with that comment.

Also, you seem to have posted the "same question" approximately seven hundred times:

https://stackoverflow.com/users/558865/icer
https://stackoverflow.com/users/516277/icer

How can I adjust the server to run my PHP script quicker?
How can I re-code my php script to run as quickly as possible?
How to run cURL once, checking domain availability in a loop? Help fixing code please
Help fixing php/api/curl code please
How to reduce virtual memory by optimising my PHP code?
Overlapping HTTPS requests?
Multiple https requests.. how to?

Doesn't the fact that you have to keep asking the same question over and over tell you that you're doing it wrong?

This comment of yours:

@mario: Cheers. I'm competing against 2 other companies for specific ccTLD's. They are new to the game and they are snapping up those domains in slow time (up to 10 seconds after purge time). I'm just a little slower at the moment.

I'm fairly sure that PHP on a shared hosting account is the wrong tool to use if you are seriously trying to beat two companies at snapping up expired domain names.

Community
  • 1
  • 1
thirtydot
  • 224,678
  • 48
  • 389
  • 349
0

The result of each of the 150 queries is being stored in PHP memory and by your evidence this is insufficient. The only conclusion is that you cannot keep 150 queries in memory. You must have a method of streaming to files instead of memory buffers, or simply reduce the number of queries and processing the list of URLs in batches.

To use streams you must set CURLOPT_RETURNTRANSFER to 0 and implement a callback for CURLOPT_WRITEFUNCTION, there is an example in the PHP manual:

http://www.php.net/manual/en/function.curl-setopt.php#98491

function on_curl_write($ch, $data)
{
  global $fh;
  $bytes = fwrite ($fh, $data, strlen($data));
  return $bytes;
}

curl_setopt ($curl_arr[$i], CURLOPT_WRITEFUNCTION, 'on_curl_write');

Getting the correct file handle in the callback is left as problem for the reader to solve.

Steve-o
  • 12,678
  • 2
  • 41
  • 60
  • @Steve-o: The query to the API is running the same URL 150+ times. The reason for this is to do continuous checks and notify me of updates/changes. The API is preventing quite a bit at the moment. How could I stream my above code to files? Reducing the number of queries can't happen, if anything it will increase, unless you have a method of processing them one after the next whilst in batches. Much appreciated, thank you. – iCeR Dec 31 '10 at 01:52
  • 150 files of length 814 are less than a megabyte all together. I don't think streaming to a file is going to make any difference here. – Billy ONeal Dec 31 '10 at 01:54
  • @Billy that's the only controllable variable here aside from clearing up the CURL handles. – Steve-o Dec 31 '10 at 01:59
  • @Steve-o: Thanks for the update on streams. Having a little trouble modifying to suit my above code to test.. Sorry. – iCeR Dec 31 '10 at 02:01
  • @Steve-o: Never mind.. Saw it now :) Thank you.. testing in a minute. Read my mind! – iCeR Dec 31 '10 at 02:01
  • @Steve-o: Warning: fwrite(): supplied argument is not a valid stream resource in /home/server/public_html/sub/test.php on line 17 – iCeR Dec 31 '10 at 02:05
  • @iCeR: Please see my latest comment on your question – thirtydot Dec 31 '10 at 02:08
  • @thitydot: answered, sorry, getting a little confusing with 2 'threads'. – iCeR Dec 31 '10 at 02:19
0
<?php

echo str_repeat(' ', 1024); //to make flush work

$url = 'http://__________/';
$fetch_count = 15;
$delay = 100000; //0.1 second
//$delay = 1000000; //1 second


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);


for ($i=0; $i<$fetch_count; $i++) {

    $start = microtime(true);

    $result = curl_exec($ch);

    list($result0, $result1) = explode("<br>", $result);
    echo "{$result0}<br>{$result1}<br>";
    flush();

    $end = microtime(true);

    $sleeping = $delay - ($end - $start);
    echo 'sleeping: ' . ($sleeping / 1000000) . ' seconds<hr />';
    usleep($sleeping);

}

curl_close($ch);

?>
thirtydot
  • 224,678
  • 48
  • 389
  • 349
  • @thirtydot: thanks for that - set a timestamp and it outputs every 900ms :/ any way to make this quicker? – iCeR Dec 31 '10 at 03:26
  • @iCeR: Can you profile where the 900ms is getting spent? For instance, change `$result = file_get_contents($url);` into `$start_get = microtime(true); $result = file_get_contents($url); $end_get = microtime(true); echo $end_get - $start_get;` – thirtydot Dec 31 '10 at 03:29
  • @thirtydot: Ok for getting contents of $url it ranges from 755ms to 932ms. After that it is only an additional 9ms. It's weird because using curl_multi it can load the url multiple times and spit them out within 50ms or less.. – iCeR Dec 31 '10 at 03:35
  • Ok then, that pretty much explains it. You can't make it faster with this current code. So, next tactic: Does the API server support HTTP keepalives (it probably will)? If it does, we can utilize them to make this faster (prevent having to open a new connection every time). `file_get_contents()` does not use keepalives. If you don't know how to find out if keepalives are supported, let me know. – thirtydot Dec 31 '10 at 03:39
  • The reason `curl_multi` can do it so fast is because it is just requesting the URL 150 times *at the start*, **with "no" delay**. – thirtydot Dec 31 '10 at 03:41
  • Can a socket do the same and not cause the memory issues? Or would I have to try and work out something using curl_multi? – iCeR Dec 31 '10 at 03:44
  • `curl_multi` will not work for the task you are trying to do, fullstop. It is firing all 150 calls "instantly" (within the first ~0.4 seconds), which equates to a delay of ~0.0025 seconds between requests. So this means that the state of domain availabilty only has a window of ~0.4 seconds to change or your script won't catch it. If it takes "755ms to 932ms" to fetch a response from the server, it will take code based on `socket_create()` utilizing keepalive support to make it faster. So, does your API server support [keepalives](http://en.wikipedia.org/wiki/HTTP_persistent_connection)? – thirtydot Dec 31 '10 at 03:57
  • @Thirtydot: indeed it does ;) now how to implement a keepalive. Thanks again, your support has been brilliant! – iCeR Dec 31 '10 at 04:09
  • I edited my answer to some different code. This combines cURL with a delay. cURL will use keepalive if you reuse the same handle (as I'm doing in the code), so this should be as fast as you can get. I realised I can't use `socket_create` and friends because you're requesting a HTTPS url. (also, it's just plain easier with cURL, and this shouldn't run up memory issues because it's only doing 1 request at a time instead of 150 at once) – thirtydot Dec 31 '10 at 04:34
  • @thirtydot: Not a prob - the average delay between each is ~700ms. – iCeR Dec 31 '10 at 04:41
  • I think we're done here then. You should probably let this question hang about for a few more days to see if anyone else has any ideas on how to speed it up. If noone else can help more, you should accept my answer :) – thirtydot Dec 31 '10 at 04:44
  • @thirtydot: Tested as you showed me earlier and the delay is in: $result = curl_exec($ch); (which takes up 90%+ of the ~700ms) – iCeR Dec 31 '10 at 04:44
  • Well, that's just how long it takes to do a request. Can't do anything about that - as far as I know, in PHP there's no way to do overlapping HTTPS requests (which would be the way around it) while getting the results as they arrive. You would need a language which supports threads to do this. – thirtydot Dec 31 '10 at 04:47
  • Cheers. If you or anyone else is able to assist with that, that would be fantastic! Thanks again for ALL the help! – iCeR Dec 31 '10 at 04:58
  • Tbh, it'd be quite pointless. If it takes 700ms to get a response, what's the point in asking for a response every 100ms? Just stick with asking every 700ms. – thirtydot Dec 31 '10 at 05:02
  • Oh screw. I just noticed something. Are you asking for exact same url over and over, or are you asking 150 different variations of a URL once? (like http://x.com/check?www.domain1.com, http://x.com/check?www.domain2.com, http://x.com/check?www.domain3.com .. http://x.com/check?www.domain150.com) – thirtydot Dec 31 '10 at 05:06
  • Same url over and over ;) please tell me you worked something out? :D ie: x.com/check?domain=testdomain.com – iCeR Dec 31 '10 at 05:13
  • Phew. If you're requesting exact same url like `x.com/check?domain=testdomain.com ` over and over, then my code is "correct". – thirtydot Dec 31 '10 at 05:17
  • Whereas if you were requesting *different* urls, I would tell you to stick with your initial code, except change it to only request 90 urls at once (due to memory issue), and loop requesting 90 until you had checked all the different domains. But, this is not the case, so nevermind. – thirtydot Dec 31 '10 at 05:18
  • Haha damn! Thought you had something ;) maybe you know how to edit the above code using curl multi-request functionality (http://us2.php.net/manual/en/function.curl-multi-init.php) for overlapping HTTPS requests. The example is with regular http but should apparently work with https too. Processing data as they come, using curl's write handler functions set by CURLOPT_WRITEFUNCTION option. (http://us2.php.net/manual/en/function.curl-setopt.php). Let me know if you have any luck putting all that together.. only if you have time, otherwise leave it. Thanks again! – iCeR Dec 31 '10 at 05:19
  • Or is that what it is currently doing? *confused* now. – iCeR Dec 31 '10 at 05:20
  • No, it's not currently doing that. According to [this](http://comments.gmane.org/gmane.comp.web.curl.library/30199) "If you're using the multi interface, then yes each call to the CURLOPT_WRITEFUNCTION will block all transfers.". So I don't think CURLOPT_WRITEFUNCTION can help. – thirtydot Dec 31 '10 at 05:34
  • @thirtydot: how about this? could you possibly assist with implementing the above (original) code to work like this? http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/ – iCeR Jan 02 '11 at 19:01