AWS ECS Task Memory Hard and Soft Limits

Question

I'm confused about the purpose of having both hard and soft memory limits for ECS task definitions.

IIRC the soft limit is how much memory the scheduler reserves on an instance for the task to run, and the hard limit is how much memory a container can use before it is murdered.

My issue is that if the ECS scheduler allocates tasks to instances based on the soft limit, you could have a situation where a task that is using memory above the soft limit but below the hard limit could cause the instance to exceed its max memory (assuming all other tasks are using memory slightly below or equal to their soft limit).

Is this correct?

Thanks

nathanpeck · Accepted Answer · 2019-04-17T17:22:56.900

If you expect to run a compute workload that is primarily memory bound instead of CPU bound then you should use only the hard limit, not the soft limit. From the docs:

You must specify a non-zero integer for one or both of memory or memoryReservation in container definitions. If you specify both, memory must be greater than memoryReservation. If you specify memoryReservation, then that value is subtracted from the available memory resources for the container instance on which the container is placed; otherwise, the value of memory is used.

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html

By specifying only a hard memory limit for your tasks you avoid running out of memory because ECS stops placing tasks on the instance, and docker kills any containers that try to go over the hard limit.

The soft memory limit feature is designed for CPU bound applications where you want to reserve a small minimum of memory (the soft limit) but allow occasional bursts up to the hard limit. In this type of CPU heavy workload you don't really care about the specific value of memory usage for the containers that much because the containers will run out of CPU long before they exhaust the memory of the instance, so you can place tasks based on CPU reservation and the soft memory limit. In this setup the hard limit is just a failsafe in case something goes out of control or there is a memory leak.

So in summary you should evaluate your workload using load tests and see whether it tends to run out of CPU first or out of memory first. If you are CPU bound then you can use the soft memory limit with an optional hard limit just as a failsafe. If you are memory bound then you will need to use just the hard limit with no soft limit.

I disagree that memory bound tasks don't need a soft limit. A reasonable soft limit helps ECS to place not on instances with insufficient memory in the first place. — iGEL, Jul 27 '18 at 12:47
@iGEL Task placement takes both hard and soft limits into account — nathanpeck, Aug 22 '18 at 17:24

score 1 · Answer 2 · answered May 23 '22 at 18:40

@nathanpeck is the authority here, but I just wanted to address a specific scenario that you brought up:

My issue is that if the ECS scheduler allocates tasks to instances based on the soft limit, you could have a situation where a task that is using memory above the soft limit but below the hard limit could cause the instance to exceed its max memory (assuming all other tasks are using memory slightly below or equal to their soft limit).

This post from AWS explains what occurs in such a scenario:

If containers try to consume memory between these two values (or between the soft limit and the host capacity if a hard limit is not set), they may compete with each other. In this case, what happens depends on the heuristics used by the Linux kernel’s OOM (Out of Memory) killer. ECS and Docker are both uninvolved here; it’s the Linux kernel reacting to memory pressure. If something is above its soft limit, it’s more likely to be killed than something below its soft limit, but figuring out which process gets killed requires knowing all the other processes on the system and what they are doing with their memory as well. Again the new memory feature we announced can come to rescue here. While the OOM behavior isn’t changing, now containers can be configured to swap out to disk in a memory pressure scenario. This can potentially alleviate the need for the OOM killer to kick in (if containers are configured to swap).

D.maurya · Answer 3 · 2023-09-01T11:50:06.077

These are all the options you have available and what happens when you pick one of them:

1.) if you only set a soft limit, that represents the reservation and the ceiling is represented by the container instance total memory

2.) if you set the soft limit and the hard limit, the soft limit represents the reservation and the ceiling is represented by the hard limit you set.

3.) if you only set the hard limit, that represents both the reservation and the ceiling.

View the memory allocations of a container instance

Firstly, open the Amazon ECS console.
Then, in the navigation pane, choose Clusters.
Next, choose the cluster that you created.
Choose the ECS Instances view, then choose the container instance included with the cluster you created from the Container Instance column. Please note that the Details pane shows that the memory in the Available column is equal to that in the Registered column.
For statistics on the resource usage of the instance, connect to the instance using SSH, and then run the docker stats command.

score -1 · Answer 4 · answered Sep 01 '23 at 12:02

Amazon Elastic Container Service (ECS) provides the means to developers to orchestrate Docker containers running in Amazon Web Services (AWS). Using ECS develops will be able to deploy and command a fleet of Docker containers, scale the services, etc. in a very flexible manner within the ECS cluster.

We use task definition to describe how we want a Docker container to be deployed in an ECS cluster. memoryReservation is one of the container definitions that need to be specified when writing the task definition, see Task definition parameters by AWS:

If a task-level memory value is not specified, you must specify a non-zero integer for one or both of memory or memoryReservation in a container definition.

Specifying X amount of memoryReservation tells ECS that this particular Docker container might need X amount of memory in MiB. It is a soft limit, which means that the container is allowed to use more than we reserved. Also from the same reference:

When system memory is under contention, Docker attempts to keep the container memory to this soft limit; however, your container can consume more memory when needed, up to either the hard limit specified with the memory parameter (if applicable), or all of the available memory on the container instance, whichever comes first.

The Docker daemon reserves a minimum of 4 MiB of memory for a container, so you should not specify fewer than 4 MiB of memory for your containers.

Reading the above, some might get the idea that why not be flexible and just set memoryReservation for all containers to 4 MiB then let ECS figures out how many memory that the container needs. In this article, we want to help the readers to understand why this is not a good idea. Then also try to help the readers to choose a reasonable value for memoryReservation.

Experimental Setup For demonstration purposes, we are going to setup a very simple playground in ECS. It is an EC2 launch type ECS cluster with a single ECS instance of type t2.micro that has 1 GiB of memory. In the cluster, we will create an ECS service that consists of following task definition:

[
  {
    "name": "worker",
    "image": "alexeiled/stress-ng:latest",
    "command": [
      "--vm-bytes",
      "300m",
      "--vm-keep",
      "--vm",
      "1",
      "-t",
      "1d",
      "-l",
      "0"
    ],
    "memoryReservation": 400
  }
]

Essentially, this task definition will launch a container that is constantly consuming 300 MB / 1.049 = 286.102 MiB of memory (see stress-ng). We can think of this as a memory-hungry worker.

AWS ECS Task Memory Hard and Soft Limits

4 Answers4

Linked