I have a project in which I am converting a large amount of .tif images into PDF documents. File count goes into millions.
To speed up the process I am using Amphp. Since the process of converting the images with Imagemagick takes up some cpu power I want to limit the maximum amount of parallel running converter processes.
My first approach works, but could be improved if I queue the files instead of giving a set amount of workers an array of x files.
This is my current code, where I tried to replicate the example.
<?php
require dirname(__DIR__) . '/vendor/autoload.php';
$constants = get_defined_constants(true);
$constants = $constants['user'];
$maxFileCount = THREAD_CHUNKSIZE * THREAD_COUNT;
$i = 0;
$folder = opendir(LOOKUP_PATH);
$tasks = [];
while ($i < $maxFileCount && (false !== ($import_file = readdir($folder)))) {
$fileParts = explode('.', $import_file);
$ext = strtolower(end($fileParts));
if($ext === 'xml') {
$filePath = LOOKUP_PATH. 'xml'.DIRECTORY_SEPARATOR.$import_file;
$tasks[] = new ConvertPdfTask([$filePath], $constants);
}
$i++;
}
if(!empty($tasks)) {
Amp\Loop::run(function () use ($tasks) {
$coroutines = [];
$pool = new Amp\Parallel\Worker\DefaultPool(THREAD_COUNT);
foreach ($tasks as $index => $task) {
$coroutines[] = Amp\call(function() use ($pool, $task) {
return yield $pool->enqueue($task);
});
}
$results = yield Amp\Promise\all($coroutines);
return yield $pool->shutdown();
});
}
My problem is, that as soon as I enqueue more than the THREAD_COUNT
amount of tasks, I get the following PHP warning: Warning: Worker in pool exited unexpectedly with code -1
and no PDFs are created.
As long as I stay below the maximum pool size, everything is fine.
I am using PHP 7.4.9 on Windows 10 and amphp/parallel 1.4.0.