3

In CakePHP, there are various systems for managing the queue itself (RabbitMQ, beanstalk, Amazon SQS, dereuromark’s cakephp-queue), but all of those seem to require an daemonized worker task. These always-on workers (which have the full power of CakePHP behind them) listen for jobs as they come into the queue, do their processing, then sit idle until the next job comes along.

Currently, I'm using a beanstalk-based queue (linked above), and it's worked okay, but in terms of server resources, it's not particularly efficient. We have memory leaks and have to kill and restart the processes sometimes.

However, now I'm trying to add more different kinds of "tubes" (in beanstalk's parlance), and I'm bumping up against RAM issues on our servers running so many different workers at once. When I spin up all of the different workers I want, I get fatal out-of-memory errors.

I'd rather have something like a "serverless"/Lambda-style setup where the worker is spun up on-demand, does its little job, then terminates itself. Kind of like a cron job calling a CakePHP shell, but with the job data dynamically being populated from the queue.

Does anyone have experience with this kind of setup for queuing? I'm on an AWS-based infrastructure, so anything that uses Amazon services would be especially helpful.

Curtis Gibby
  • 884
  • 1
  • 12
  • 25

1 Answers1

1

As far as I know, there is only two ways to run PHP. Either as a thread inside a web container (Apache, Nginx, CGI) or as a shell process (single-threaded). When you run it on the shell you're stuck with 1 thread per process.

I know that sucks, but PHP is not the best tool for server workers. A Lambda architecture isn't going to really help solve this problem. You're just off loading your multi-threading issues to another host.

At the end of the day, the easiest solution is to just run more PHP processes. If you're having crashes. You need to run PHP inside a shell script. It's just the nature of PHP on the command line.

But, I will share from my experience what other options you have.

However, now I'm trying to add more different kinds of "tubes" (in beanstalk's parlance), and I'm bumping up against RAM issues on our servers running so many different workers at once. When I spin up all of the different workers I want, I get fatal out-of-memory errors.

Last time I checked beanstalk was single threaded. So I don't think it's possible for PHP to spawn multiple workers at once with beanstalk. You have to run 1 PHP instance which gets a message and works on it. If you want to scale you have to run multiple PHP instances.

It sounds like your workers are either having memory leaks, or are simply consuming a lot of memory. I don't see how this has anything to do with beanstalk. You have to fix your leaks and change your source code to use less memory.

I've had to rewrite PHP code to use a forward XML parser, because the other XML parser would load the entire document into memory. The forward reading parser used less memory, but it was a pain to rewrite all my code. You have to decide which costs you more. Spending more money on Ram or spending time rewriting the code. That's your call.

Memory

PHP comes with a soft limit on memory usage. Even if the host machine has lots of memory the PHP thread will throw an out of memory error when it hits the soft limit. It's something you have to manually change in the php.ini file. Forgive me if you're already done this, but I thought it was worth mentioning.

Increase PHP memory limit in php.ini:

memory_limit = 128M

Disposable Pattern

I solved a lot of my memory leaks using a disposable pattern. It's just a simple interface you use on objects, and then wrap code in a using() function. I was able to reduce my memory leaks by 99% with this library. (full disclosure, this is my github library).

https://github.com/cgTag/php-disposable

Multi-threaded PHP

There is an open source project that adds multi-thread support to PHP, and it looks to me like a solid library.

https://github.com/krakjoe/pthreads

The project adds multi-thread support to PHP (with a C++ module) that basically creates a new global scope for each thread. This allows you to run a CakePHP shell in each thread, and I think there is an API for thread-to-thread sharing of data (mutex and things like that).

Dockerize

I've had some success in running docker just to handle a single CakePHP shell task. This allowed me to quickly scale up by running multiple containers on the same host machine. The overhead of extra memory for containers really wasn't that bad. I don't remember the exact number, but it's less than what you might think.

Daemons

They are the tried and tested way of running services on Linux. The only problem here is that it's 1 thead in PHP per daemon. So you have to register multiple daemons to scale up. With that said, this option works good with the multi-thread library above.

Reactgular
  • 52,335
  • 19
  • 158
  • 208
  • Thanks for answering. I've already looked at memory limits on the server itself and in php.ini. This particular project is stuck on PHP 5, so the "disposable" and "pthreads" suggestions are non-starters for me personally. That leaves "Dockerize". Could you elaborate on how you'd pass job-level data into a shell task using Docker? – Curtis Gibby Jan 26 '18 at 19:16