Run functions from PHP library concurrently making HTTP requests, without using curl multi

Question

I want to use Google translate's v3 PHP library to translate some text into a bunch of different languages. There may be workarounds (though none ideal that I know of), but I'm also just trying to learn.

I wanted to use using multiple calls to translateText, one call per target language. However, to make things faster, I would need to do these requests concurrently, so I was looking into some concurrency options. I was wanting to use calls to translateText instead of constructing a bunch of curl requests manually using curl multi.

I tried the first code example I found from one of the big concurrency libraries I've seen recommended, amphp. I used the function parallelMap, but I'm getting timeout errors when creating processes. I'd guess that I'm probably forking out too many processes at a time.

I'd love to learn if there is an easy way to do concurrency in PHP without having to make a bunch of decisions about how many processes to have running at a time, whether I should use threads vs processes, profiling memory usage, and what PHP thread extension is even any good / if the one I've heard of called "parallel" may be discontinued (as suggested in a comment here).

Any stack overflow post I've found so far is just a link to one giant concurrency library or another that I don't want to have to read a bunch of documentation for. I'd be interested to hear how concurrency like this is normally done / what options there are. I've found many people claim that processes aren't much slower than threads these days, and I can't find quick Google answers as to whether they take a lot more memory than threads. I'm not even positive that a lack of memory is my problem, but it probably is. There has been more complexity involved than I would have expected.

I'm wondering how this is normally handled.

Do people normally just use a pool of workers (processes by default, or threads using, say the "parallel" PHP extension) and set a max number of processes to run at a time to make concurrent requests?

If in a hurry, do people just kind of pick a number that isn't very optimized for how many worker processes to use?

It would be nice if the number of workers to use was set dynamically for you based on how much RAM was available or something, but I guess that's not realistic, since the amount of RAM available can quickly change.

So, would you just need to set up a profiler to see how much ram one worker process/thread uses, or otherwise just make some sort of educated guess as to how many worker processes/threads to use?

score 0 · Answer 1 · answered Aug 02 '23 at 11:20

I think you are missing basics, how plattforms are working: Engines, Operating System, Processor, ...

In your initial example you mentioned JavaScript. But JavaScript itself is single-threading/-processing too. It only provides "syntactic sugar" to start a job, put it on a queue (macro+micro), do other stuff and next time JS has free time, it executes the next job of its queues. No parallelismn involved.

An example for new browser window:

await Promise.all([
    new Promise(resolve => { console.log("Promise 1 started"); while (true); console.log("Promise 1 finished"); resolve(); }),
    new Promise(resolve => { console.log("Promise 2 finished"); resolve(); })
]);
console.log("All awaited");

All you will get on console is "Promise 1 started" and one process that is wasting a lot of CPU time because of the while(true);. This process must be forced to close.

Normally the JS engines does many stuff for you: It has to switch the context for each job, requires to track memory and do garbage collections.

If you wanna real parallelism, you need to do this yourself - or a library needs to do it. e.G. Apache WebServer supports fine grayed options to handle many parallel incoming requests. It keeps "warm" workers with reserved memory a request can directly jump into to be processed. Memory always matters: it needs to be allocated before it is used. So if one say, 1k parallel requests are required and a request may contain 10MB POST data ... 10GB+ memory must be reserved to support the requests, regardless if all of them will be GET requests as of "/short/url?a=1"

But even if you have a plan for your memory and inter-process communication: You wanna do web-requests. This is mostly handled by the external cURL client: how many parallel accesses can the client handle? How many parallel connections support your network interface? How many allow your ISP?

The more "specific" own requirements are, the more things you must keep yourself eyes on

score 0 · Answer 2 · answered Aug 02 '23 at 13:32

Run functions from PHP library concurrently making HTTP requests, without using curl multi

Eh? I would expect that any PHP library you find will be using curl_multi.

Spawning PHP threads from a request handled by a web SAPI is virtually impossible although it is relatively easy from the CLI SAPI as long as you are running on Linux/Unix.

But what's your aversion to using curl_multi from PHP? I would agree that the official documentation is not very clear but there are other sources on the interwebs.

If it were me, I'd write a PHP to handle translation of one piece of text, then invoke multiple instances of it using appropriate arguments from a PHP script with curl_muli_exec().

Stas Trefilov · Answer 3 · 2023-08-02T14:33:54.537

Php is simply not a language allowing parallel execution from the box. There is no such thing as global event loop or alike. But, some of php mechanisms to somehow run code in parallel do exist:

curl_multi family of functions, to execute http requests in parallel (only applies to http requests done via curl extension) — something you kind of dislike;
pcntl family of functions, to be able to fork current process as many times as needed and control child termination (applies to any php process, but platform specific) — a bit more complicated than curl_multi;
php-fpm architecture is in itself a methodology of keeping a certain number of parallel processes in memory and control their spawning/ killing.

However I'd propose a much simpler solution of launching several requests in parallel from a shell script in background using & suffix and wait command:

php translate-string.php --msg 'MESSAGE1' --from 'en' --to 'fr' &
php translate-string.php --msg 'MESSAGE2' --from 'en' --to 'fr' &
php translate-string.php --msg 'MESSAGE3' --from 'en' --to 'fr' &
php translate-string.php --msg 'MESSAGE4' --from 'en' --to 'fr' &
wait

Launching background processes in shell

Run functions from PHP library concurrently making HTTP requests, without using curl multi

3 Answers3

Linked