0

How do I prevent containers that use selenium from 'freezing' after getting OOM errors and seeing Failed to start thread - pthread_create failed (EAGAIN)? What is the root cause and how do I fix it? Further, how can I test the solution locally and how can I implement the solution on AWS?

jsotola
  • 2,238
  • 1
  • 10
  • 22
HeronAlgoSearch
  • 1,581
  • 2
  • 18
  • 35

1 Answers1

0

The following is a guide to diagnose and solve OOM issues occurring due to long lived Selenium instances. Specifically, on AWS you will likely get Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. and the dreaded OOM error should your Selenium process run long enough. This guide assumes that you are using Docker and can run your docker file via docker run <some args if you wish> your_image.

Diagnosis Run your app using docker run .... In a separate tab do docker ps, find your Container ID and do docker container stats CONTAINER_ID. The key is to observe the PIDS column. Now trigger the selenium process to run, ideally many times (to test this you may want to create a for loop simply to test. You will notice that the PIDs grows without bound. This is because (reference: Selenium leaves behind running processes?) will leave around zombie processes.

Solution

The solution is to cull the zombie processes. Specifically per Selenium leaves behind running processes? there is a flag --init and this will be the key to the solution. You need to run docker run --init ... (note the --init). Per https://docs.docker.com/engine/reference/run/ "You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container." . To be confident that the solution will work for you, run your image with docker run --init .... Re-trigger the calls to Selenium. This time the PIDS may grow, but not without bound (for me the number of PIDs never passed 200).

Solution - AWS

References are https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-taskdefinition-linuxparameters.html and https://www.ernestchiang.com/en/posts/2021/using-amazon-ecs-exec/ In the 'Task Definition' (search for 'ECS' -> click on Task Definitions) select your task definition. Then scroll to the bottom and click on Configure via JSON. Next, find linuxParameters and, if it is null, replace null with the value:

{
"initProcessEnabled": true
}

If you already have a JSON value for linuxParameters then just add "initProcessEnabled": true as a JSON parameter. Next, crate your task definition and deploy!

Solution - Other

I have not used Google Cloud or Microsoft offerings, so I do not know how to add the --init flag. If someone with such experience could tell me how to do that, I would be happy to update the guide.

HeronAlgoSearch
  • 1,581
  • 2
  • 18
  • 35