36

When pushing images to Amazon ECR, if the tag already exists within the repo the old image remains within the registry but goes in an untagged state.

So if i docker push image/haha:1.0.0 the second time i do this (provided that something changes) the first image gets untagged from AWS ECR.

Is there a way to safely clean up all the registries from untagged images?

Andrea Baccega
  • 27,211
  • 13
  • 45
  • 46

7 Answers7

52

You can delete all images in a single request, without loops:

IMAGES_TO_DELETE=$( aws ecr list-images --region $ECR_REGION --repository-name $ECR_REPO --filter "tagStatus=UNTAGGED" --query 'imageIds[*]' --output json )

aws ecr batch-delete-image --region $ECR_REGION --repository-name $ECR_REPO --image-ids "$IMAGES_TO_DELETE" || true

First it gets a list of images that are untagged, in json format:

[ {"imageDigest": "sha256:..."}, {"imageDigest": "sha256:..."}, ... ]

Then it sends that list to batch-image-delete.

The last || true is required to avoid an error code when there are no untagged images.

nfvs
  • 1,231
  • 1
  • 11
  • 7
  • I don't have the '--filter' option, what version do you use? (I have aws-cli/1.10.39) – Dimitris Feb 06 '17 at 12:02
  • Update: v1.11.44 supports filters – Dimitris Feb 06 '17 at 12:07
  • 2
    ECR doesn't support more than 100 images. This fixes that: ``` IMAGES_TO_DELETE=$( aws ecr list-images --region $ECR_REGION --repository-name $ECR_REPO --filter "tagStatus=UNTAGGED" --query 'imageIds[*]' --max-items 100 --output json ) aws ecr batch-delete-image --region $ECR_REGION --repository-name $ECR_REPO --image-ids "$IMAGES_TO_DELETE" || true ``` – Scott Gigante Mar 22 '21 at 16:14
43

Now, that ECR support lifecycle policies (https://docs.aws.amazon.com/AmazonECR/latest/userguide/LifecyclePolicies.html) you can use it to delete the untagged images automatically.

Setting up a lifecycle policy preview using the console

Open the Amazon ECS console at https://console.aws.amazon.com/ecs/.

From the navigation bar, choose the region that contains the repository on which to perform a lifecycle policy preview.

In the navigation pane, choose Repositories and select a repository.

On the All repositories: repository_name page, choose Dry-Run Lifecycle Rules, Add.

Enter the following details for your lifecycle policy rule:

For Rule Priority, type a number for the rule priority.

For Rule Description, type a description for the lifecycle policy rule.

For Image Status, choose either Tagged or Untagged.

If you specified Tagged for Image Status, then for Tag Prefix List, you can optionally specify a list of image tags on which to take action with your lifecycle policy. If you specified Untagged, this field must be empty.

For Match criteria, choose values for Count Type, Count Number, and Count Unit (if applicable).

Choose Save

Create additional lifecycle policy rules by repeating steps 5–7.

To run the lifecycle policy preview, choose Save and preview results.

Under Preview Image Results, review the impact of your lifecycle policy preview.

If you are satisfied with the preview results, choose Apply as lifecycle policy to create a lifecycle policy with the specified rules.

From here: https://docs.aws.amazon.com/AmazonECR/latest/userguide/lpp_creation.html

Lorant Fecske
  • 563
  • 5
  • 10
  • I tried the steps listed in your answer, but it doesn't seem to delete old images. When dry-running the policy it correctly lists all images I intend to delete, but they don't actually get deleted when applying the policy. Anything I'm missing? – Broadwell Jan 12 '18 at 08:31
  • 1
    @Broadwell ... it doesn't run immediately after applying the policy. Give it some time (an hour to two) and you'll find that it runs. – Kevin Dec 14 '18 at 04:30
26

I actually forged a one line solution using aws cli

aws ecr describe-repositories --output text | awk '{print $5}' | egrep -v '^$' | while read line; do  repo=$(echo $line | sed -e  "s/arn:aws:ecr.*\///g") ; aws ecr list-images --repository-name $repo --filter tagStatus=UNTAGGED --query 'imageIds[*]' --output text | while read imageId; do aws ecr batch-delete-image --repository-name $repo --image-ids imageDigest=$imageId; done; done

What it's doing is:

  • get all repositories
  • for each repository give me all images with tagStatus=UNTAGGED
  • for each image+repo issue a batch-delete-image

If you have JQ, you can use this version that is more robust by not relying on the changing text format and also more efficient as it batch deletes once per repository:

aws ecr  describe-repositories \
| jq --raw-output .repositories[].repositoryName \
| while read repo; do  
    imageIds=$(aws ecr list-images --repository-name $repo --filter tagStatus=UNTAGGED --query 'imageIds[*]' --output json  | jq -r '[.[].imageDigest] | map("imageDigest="+.) | join (" ")');
    if [[ "$imageIds" == "" ]]; then continue; fi
    aws ecr batch-delete-image --repository-name $repo --image-ids $imageIds; 
done

This has been broken up into more lines for readability, so better put it into a function in your .bashrc, but you could of course stuff it into a single line:

aws ecr  describe-repositories | jq --raw-output .repositories[].repositoryName | while read repo; do           imageIds=$(aws ecr list-images --repository-name $repo --filter tagStatus=UNTAGGED --query 'imageIds[*]' --output json  | jq -r '[.[].imageDigest] | map("imageDigest="+.) | join (" ")');         if [[ "$imageIds" == "" ]]; then continue; fi;         aws ecr batch-delete-image --repository-name $repo --image-ids $imageIds;      done
oligofren
  • 20,744
  • 16
  • 93
  • 180
Andrea Baccega
  • 27,211
  • 13
  • 45
  • 46
  • 4
    Could be that the AWS CLI output changed, but `awk '{print $5}'` gives the ARN. It should be `awk '{print $6}'` to get the repository name. – ivica Apr 28 '20 at 14:10
  • The original answer no longer works, as the output does not match. I'll update. – oligofren Jan 18 '22 at 08:11
7

Setting a Lifecycle policy is definitely the best way of managing this. That being said - if you do have a bunch of images that you want to delete keep in mind that the max for batch-delete-images is 100. So you need to do this is for the number of untagged images is greater than 100:

IMAGES_TO_DELETE=$( aws ecr list-images --repository-name $ECR_REPO --filter "tagStatus=UNTAGGED" --query 'imageIds[0:100]' --output json )
echo $IMAGES_TO_DELETE | jq length # Gets the number of results
aws ecr batch-delete-image --repository-name $ECR_REPO --image-ids "$IMAGES_TO_DELETE" --profile qa || true
Ken J
  • 877
  • 12
  • 21
1

If you want to remove an untagged image from a repository you can simply create a JSON lifecycle policy and then use python to apply the JSON policy to the repo

In my case, I am applying the policy to all the ECR repositories that are there in ECR and I have created a "lifecyclepolicy.json" file in my current directory where I have added the lifecycle policy of ECR

Here is my python code:-

    import os
    import json
    import boto3
 
    def ecr_lifecycle(lifecycle_policy):
        ecr_client = boto3.client('ecr')

        repositories = []
        describe_repo_paginator = ecr_client.get_paginator('describe_repositories')
        for response_list_repopaginator in describe_repo_paginator.paginate():
            for repo in response_list_repopaginator['repositories']:
                repositories.append(repo['repositoryName'])
        for repository in repositories:
            response=ecr_client.put_lifecycle_policy(repositoryName=repository,
            lifecyclePolicyText=json.dumps(lifecycle_policy))
        return response


    if __name__ == '__main__':
        path = os.path.dirname(__file__) 
        json_file = open(os.path.join(path, 'lifecyclepolicy.json'))
        data = json.load(json_file)
        ecr_lifecycle(data)

If you want to see the JSON file:-

{

"rules": [
    {
      {
        "rulePriority": 10,
        "description": "Only keep untagged images for 7 days",
        "selection": {
            "tagStatus": "untagged",
            "countType": "sinceImagePushed",
            "countUnit": "days",
            "countNumber": 7
        }
        "action": {
            "type": "expire"
        }
    }
  ]
}
Bharat
  • 201
  • 2
  • 9
0

Base on @Ken J's anwer,

Here is a python script that will clean ALL your ECR:

#!/usr/bin/python3
import subprocess
import json
import os
# Based on: https://stackoverflow.com/questions/40949342/how-to-delete-untagged-images-from-aws-ecr-container-registry
region="us-east-1"

debug = False

def _runCommand(command):
    if debug:
        print(" ".join(command))
    p = subprocess.Popen(command, shell = False, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
    return [p.stdout.read().decode("utf-8"), p.stderr.read().decode("utf-8")]

command = "aws ecr describe-repositories --region " + region + " --output json".split(" ")
data = _runCommand(command)[0]

for i in json.loads(data)["repositories"]:
     name = i["repositoryName"]
     print(name)
     command = ["aws", "ecr", "list-images", "--region", region, "--repository-name", name, "--filter", "tagStatus=UNTAGGED", "--query", 'imageIds[*]', "--output" , "json"]
     data = _runCommand(command)[0]
     
     command = ["aws", "ecr", "batch-delete-image", "--region", region, "--repository-name", name, "--image-ids",data]
     data = _runCommand(command)[0]
     print(data)
GuySoft
  • 1,723
  • 22
  • 30
0

First Step -->

untaggedImages = aws ecr list-images --repository-name <your_repo_name> --filter "tagStatus=UNTAGGED" --query 'to_string(imageIds[*])' --output json""")

Second step -->

aws ecr batch-delete-image --repository-name <your_repo_name> --image-ids "$untaggedImages" || true """)

to_string function is required because the returned JSON won't be in string format, instead it will be as an Object.

Karthik
  • 1
  • 2