Starting a process thousands of times per minute on a production server

Question

In order to convert some html files to pdf I managed to implement a quick solution using this SO answer.

Essentially, it is a webapi service which, upon receiving a html file, puts an entry into a message queue. A background worker picks up the entry, renders the pdf using phantomjs.exe and emails it later on.

It all works, but my worry is that on the production servers we will potentially get thousands of html files per minute, and for each html file , run phantomjs. Will the background worker starting a phantomjs process for each file starve the server?

Thank you

Well, before you put it into production....you can run Load and performance test! Performance for html conversion, and load test for its usage. You already designed baddest pattern - try to reproduce it. — eocron, Oct 06 '16 at 06:20
Why would you ever want to do that? Never mind that processes are orders of magnitude heavier than threads in *any* OS, each request already runs on a separate thread. Just find a PDF rendering library and use it — Panagiotis Kanavos, Oct 06 '16 at 07:25
BTW the linked answer is terrible. It uses a separate thread even though the requests uses its own thread, then performs a busy wait on the request thread. This will eradicate performance. Also note that using JavaScript to generate PDF is the hardest and slowest way possible, used when you *can't* use a library for the job. — Panagiotis Kanavos, Oct 06 '16 at 07:29

score 0 · Answer 1 · answered Oct 06 '16 at 06:04

There are many factors that you have to consider.

You have to know how many background workers are running to do the task.
You need to know the server specs since more processing means more CPU usage.
You need to do stress testing on a test environment first before you go to production server. Your test environment should have the same specs as your production environment.

We are unable to determine your exact needs, you have to test and confirm this yourself.

score 0 · Answer 2 · answered Oct 06 '16 at 07:22

Although your solution should work, it does not appear to scale up too well. How long will it take for your machine to eventually start having problem with lack of resources will depend on the frequency of the requests and also the amount of work the background worker will need to do.

The fact that you have disjointed the front end with the PDF generation is a step in the right direction, but having them run on the same machine would still have an impact on the entire site (again, the impact will depend on the frequency and size of the HTML files).

What you could do, would be to have another machine (or a cluster of them) to handle the transformation and transmission of emails. This would leave your front end machine free to deal with incoming requests.

You could use something like Rabbit MQ to handle your queuing or else, roll out your own. The former option will probably make it easier to scale up things should you need to.

Starting a process thousands of times per minute on a production server

2 Answers2