This is a prime candidate for multi-threading, and here's some code to do it:
<?php
class WebWorker extends Worker {
public function run() {}
}
class WebTask extends Stackable {
public function __construct($input, $output) {
$this->input = $input;
$this->output = $output;
$this->copied = 0;
}
public function run() {
$data = file_get_contents($this->input);
if ($data) {
file_put_contents(
$this->output, $data);
$this->copied = strlen($data);
}
}
public $input;
public $output;
public $copied;
}
class WebPool {
public function __construct($max) {
$this->max = $max;
$this->workers = [];
}
public function submit(WebTask $task) {
$random = rand(0, $this->max);
if (isset($this->workers[$random])) {
return $this->workers[$random]
->stack($task);
} else {
$this->workers[$random] = new WebWorker();
$this->workers[$random]
->start();
return $this->workers[$random]
->stack($task);
}
}
public function shutdown() {
foreach ($this->workers as $worker)
$worker->shutdown();
}
protected $max;
protected $workers;
}
$pool = new WebPool(8);
$work = [];
$start = microtime(true);
foreach (glob("csv/*.csv") as $file) {
$file = fopen($file, "r");
if ($file) {
while (($line = fgetcsv($file, 0, ";"))) {
$wid = count($work);
$work[$wid] = new WebTask(
$line[0], $line[1]);
$pool->submit($work[$wid]);
}
}
}
$pool->shutdown();
$runtime = microtime(true) - $start;
$total = 0;
foreach ($work as $job) {
printf(
"[%s] %s -> %s %.3f kB\n",
$job->copied ? "OK" : "FAIL",
$job->input,
$job->output,
$job->copied/1024);
$total += $job->copied;
}
printf(
"[TOTAL] %.3f kB in %.3f seconds\n",
$total/1024, $runtime);
?>
This will create a maximum number of pooled threads, it will then read through a directory of semi-colon seperated csv files where each line is input;output, it will then submit the task to read the input and write the output asynchronously to the pool for execution, while the main thread continues to read csv files.
I have used the simplest input/output file_get_contents
and file_put_contents
so that you can see how it works without cURL
.
The worker selected when a task is submitted to the pool is random, this may not be desirable, it's possible to detect if a worker is busy but this would complicate the example.
Further reading: