23

I have an issue using Gearman that is slow to transfer tasks to workers when I send it large payloads via the Gearman PHP extension. In fact, we don't find the payload to be that big (it's 30MB). Everything (PHP, Gearman, node) runs locally for now, so network access is not the bottleneck.

The PHP script

Here is the PHP client :

ini_set('memory_limit', '1G');

$client= new GearmanClient();
$client->addServer('127.0.0.1', '4730');

$schema = file_get_contents('schema.json');
$data = file_get_contents('data.json');

$gearmanData = [
    'schema' => $schema,
    'data' => $data
];

echo "Encoding in JSON the payload\n";

$gearmanDataString = json_encode($gearmanData, JSON_FORCE_OBJECT);

echo "Sending job to Gearman\n";

// This line takes long to execute...
$result = $client->doNormal("validateJsonSchema", $gearmanDataString);

echo "Job finished\n";

var_dump($result);

This is my nodejs worker which will eventually do something, but is empty to demonstrate that worker code is not an issue :

var gearmanode = require('gearmanode');

var worker = gearmanode.worker({host: '127.0.0.1' port: 4730});

worker.addFunction('validateJsonSchema', function (job) {
    console.log('I will do something');

    job.workComplete('Toasty!');
});

I start my worker in background and then run my client, and it freezes while for 30 seconds or so while doing $client->doNormal (just after outputing Sending job to Gearman), and finishes by outputing string(7) "Toasty!" via PHP's var_dump. So it works, but its just long to process.

Also, if I reduce the size of the payload (data.json), it takes less time, so the payload size seems to matter.

I tried to code the same worker in PHP, with the same result:

$worker= new GearmanWorker();
$worker->addServer('127.0.0.1', '4730');
$worker->addFunction("validateJsonSchema", "validateJsonSchema");
while ($worker->work());

function validateJsonSchema($job)
{
  return 'ToastyPHP!';
}

UPDATE

Using the node.js client, doing almost the same thing than in PHP, it executes much faster (~3.5 seconds). Am I doing something wrong with the PHP version, or I am missing some configuration to make it quicker ?

My node.js client:

var gearmanode = require('gearmanode');
var fs = require('fs');

var start = Date.now(); 

var client = gearmanode.client();

schema = fs.readFileSync('schema.json', 'utf8');
data = fs.readFileSync('data.json', 'utf8');

var submitData = JSON.stringify({ "data": data, "schema": schema });

// Runs much faster than PHP
var job = client.submitJob('validateJsonSchema', submitData, {background: false});

job.on('complete', function() {
    console.log('RESULT >>> ' + job.response);
    client.close();

    var end = Date.now(); 

    console.log(end-start + ' milliseconds'); // Always shows around 3500 milliseconds
});

Any clue why is this happening ? Is Gearman made to handle this size of payload ? 30MB is not that big in my book.

MaxiWheat
  • 6,133
  • 6
  • 47
  • 76
  • I would do some basic timings debugging to see where in your script its taking a long time. – cmorrissey Aug 06 '15 at 19:34
  • @cmorrissey I already did, takes ~30 seconds everytime, if I decrease the payload to half the size, it takes ~15 seconds... so it's directly linked to the size of the payload, but I don't understand why a not so big payload takes that much ttime to be handled by gearmand. – MaxiWheat Aug 06 '15 at 19:39
  • but where in the script does it take 30 seconds, is it actually gearman or is it `json_encode`, etc ? – cmorrissey Aug 06 '15 at 19:47
  • I edited my question, it takes 30 seconds to run `$client->doNormal` – MaxiWheat Aug 06 '15 at 19:51
  • Your issue is with what you are doing with the payload after validateJsonSchema is called. Can you post your actual code (IE post what you currently have for validateJsonSchema) – dannypaz Aug 06 '15 at 20:04
  • This IS the actual code, and still takes that much time to process. – MaxiWheat Aug 06 '15 at 20:05
  • Even with no operations, just adding the payload takes 30 seconds? – dannypaz Aug 06 '15 at 20:07
  • @livepo Yes, 30MB payload without worker actual operation gives 30sec to execute – MaxiWheat Aug 06 '15 at 21:28
  • @cmorrissey I updated my question using node.js client which is much faster, but I need to use it in PHP. – MaxiWheat Aug 07 '15 at 14:36
  • could you post gearman server and client versions that you are using along with php and os versions? – Goran Miskovic Aug 15 '15 at 15:04
  • 2
    You might want to reconsider your payload size since gearman maintains the entire payload in memory. – Paras Aug 15 '15 at 16:38
  • @Paras The memory thing does not hold up... the server still has enough memory, `json_encode` is able to run super-fast on that same data size, and with a node.js client it is much faster for the same gearman server. – MaxiWheat Aug 16 '15 at 15:09
  • @schkovich Gearmand server 1.1.12, Gearman php extension 1.1.2, PHP 5.6.10 and Slackware 14.1 – MaxiWheat Aug 16 '15 at 15:27
  • 1
    Have you tried encoding the $schema and $data into base64 or something else other than a JSON string? On my test server running PHP 5.3.3 on CentOS 6.6 it seems to have issues encoding JSON within JSON like you are attempting to. Its a wild guess, but maybe encapsulating JSON within JSON is giving Gearman and issue? My test ran fine with (under 0.1s) using base64 for the $data and $schema. – xangxiong Aug 19 '15 at 06:34
  • @MaxiWheat, my comment was a general observation. Having a large payload may not be suitable for production loads. – Paras Aug 20 '15 at 02:16
  • @MaxiWheat have you tried the streaming method? – joewright Aug 20 '15 at 18:20
  • @joewright I don't know what you mean by "streaming method", any example code somewhere ? – MaxiWheat Aug 20 '15 at 19:54
  • @MaxiWheat I may have read the wrong package's documentation. [This package](https://github.com/andris9/node-gearman) supports stream objects, and may perform a little better than buffering the entire file before sending to gearman. [Explanation of streams](https://github.com/substack/stream-handbook) – joewright Aug 24 '15 at 13:38
  • @xangxiong I tried encoding the payload into base64 before sending the job to Gearman, same result, even longer. – MaxiWheat Aug 25 '15 at 14:13
  • i would consider a way to split the payload in to smaller jobs.. lets say i have job of sending 1000 push notifications. i would split them in to 10 per batch and queue them. – astroanu Sep 04 '15 at 06:19
  • what if you add some ``echo microtime(true)."\n";`` in your code to detect where is the bottleneck? I would add one at the beginning of the file, another after reading every json file, the third one after encoding to json and the last one after calling the ``$client->doNormal(...)`` – rdgfuentes Sep 07 '15 at 03:59
  • @rdgfuentes I already done that, that's why I wrote this comment in code: "// This line takes long to execute..." – MaxiWheat Sep 07 '15 at 14:36
  • You may try to profile your code using Blackfire. I think it can help you to fine the bottleneck of execution time. Here's a way to do this: http://stackoverflow.com/questions/30645598/how-to-profile-a-php-shell-script-app-or-worker-using-blackfire – Gustavo Straube Sep 08 '15 at 00:06
  • @MaxiWheat you should use consider to make your process asynchronous. you can call doBackground method instead of doNormal so that if your process takes time then it won't affect on foreground – sandeep_kosta Oct 07 '15 at 05:37
  • Node is asynchronous and non-blocking, so I think that would be the reason why node executes faster, rather than the synchronous nature of php. –  Oct 12 '15 at 04:40
  • The PHP script would benefit from [references](http://php.net/manual/en/language.references.php). If either of those files are 30MB you are using 90MB memory just shipping them between variables. – NoChecksum Nov 10 '15 at 00:45
  • I agree with you @NoChecksum but it does not change the fact that I send ~30MB to Gearman. My machine has more than enough memory and PHP does not crash with a memory limit exceeded. – MaxiWheat Nov 16 '15 at 00:29
  • Have you tried to monitor the gearman client's progress like [this example shows](http://us.php.net/manual/en/gearmanclient.donormal.php#example-4831)? – Sean Bright Nov 18 '15 at 21:03

1 Answers1

1

Check whether this code works for you ,took really small time to complete the job.

worker.php:

echo "Starting\n";
$gmworker = new GearmanWorker();

# Add default server (localhost).
$gmworker->addServer('127.0.0.1', '4730');
$gmworker->addFunction("jsonValid", "jsonValid");



print "Waiting for job...\n";
while ($gmworker->work()) {
    if ($gmworker->returnCode() != GEARMAN_SUCCESS) {
        echo "return_code: " . $gmworker->returnCode() . "\n";
        break;
    }
}

function jsonValid($job)
{
    return 'ToastyPHP!';
}

Client.php

ini_set('memory_limit', '1G');

$client = new GearmanClient();
$client->addServer('127.0.0.1', '4730');
$client->setCompleteCallback("complete");
$time = time();

echo "<pre>Sending job..." . "\n";


$schema = file_get_contents('AllSets.json');
$data = file_get_contents('AllSets.json');


$gearmanData = Array(
    'schema' => $schema,
    'data' => $data
);

$gearmanDataString = json_encode($gearmanData, JSON_FORCE_OBJECT);

$client->addTask("jsonValid", $gearmanDataString, null, 'Json');
$client->runTasks();

echo "Job finished\n";

$endtime = time();
print "Completed in " . ($endtime - $time) . ' seconds' . "\n";

function complete($task)
{
    print "Unique : " . $task->unique() . "\n";
    print "Data : " . $task->data() . "\n";
}

I have used the addTask and runTasks methods instead of doNormal.For the json data to be send I used the AllSets.json file from http://mtgjson.com/ around 30Mb size(total load), the job finised in 1 Sec , and after trying a file of around 200Mb it took 4 sec.

nitish koundade
  • 801
  • 5
  • 12