33

I have a private repository at gitlab.com that uses the CI feature. Some of the CI jobs create artifacts files that are stored. I just implemented that the artifacts are deleted automatically after one day by adding this to the CI configuration:

expire_in: 1 day

That works great - however, old artifacts won't be deleted (as expected). So my question is:

How can I delete old artifacts or artifacts that do not expire? (on gitlab.com, no direct access to the server)

user1251007
  • 15,891
  • 14
  • 50
  • 76

7 Answers7

24

An API call should be easier to script, with GitLab 14.7 (January 2022), which now offers:

Bulk delete artifacts with the API

While a good strategy for managing storage consumption is to set regular expiration policies for artifacts, sometimes you need to reduce items in storage right away.

Previously, you might have used a script to automate the tedious task of deleting artifacts one by one with API calls, but now you can use a new API endpoint to bulk delete job artifacts quickly and easily.

See Documentation, Issue 223793 and Merge Request 75488.

 curl --request DELETE --header "PRIVATE-TOKEN: <your_access_token>" \
      "https://gitlab.example.com/api/v4/projects/1/artifacts"

As noted by Lubo in the comments:

Response of given API is 202 Accepted. It means for me, that deletion will happen on background.

Also admin area ís updated a bit later than deletion happens

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Out of curiosity will this delete the raw job logs associated with each job as well? Do those count towards the artifact storage limit? – Brian Jordan Aug 10 '22 at 14:41
  • @BrianJordan Apparently, it does *not* delete logs: https://gitlab.com/gitlab-org/gitlab/-/issues/223793#note_443460706. https://gitlab.com/gitlab-org/gitlab/-/merge_requests/75488 mentions "erasable job artifacts - all job artifacts except trace". – VonC Aug 10 '22 at 16:53
  • I get '202: Accepted' back but the Artifacts size of my repository remains the same. – domfz Sep 19 '22 at 13:18
  • @GrogPirate With what version of GitLab? Or is it gitlab.com? In any case, it is best to ask a separate question. – VonC Sep 19 '22 at 14:58
  • @GrogPirate We have the same problem on gitlab.com, this bulk delete api seems not work, or does not what is expected. – nadar Oct 05 '22 at 11:30
  • @nadar I unfortunately ended up moving the repository since I could not get rid of the remaining artifacts. – domfz Oct 06 '22 at 12:33
  • Response of given API is 202 Accepted. It means for me, that deletion will happen on background. Also admin area ís updated a bit later than deletion happens. – Lubo Jul 24 '23 at 10:04
  • @Lubo Good point. I have included your comment in the answer for more visibility. – VonC Jul 24 '23 at 19:04
23

You can use the GitLab REST API to delete the artifacts from the jobs if you don't have direct access to the server. Here's a sample curl script that uses the API:

#!/bin/bash
    
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="3034900"
    
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="Lifg_azxDyRp8eyNFRfg"
server="gitlab.com"
    
# go to https://gitlab.com/[organization name]/[repository name]/-/jobs
# then open JavaScript console
# copy/paste => copy(_.uniq($('.ci-status').map((x, e) => /([0-9]+)/.exec(e.href)).toArray()).join(' '))
# press enter, and then copy the result here :
# repeat for every page you want
job_ids=(48875658 48874137 48873496 48872419)
    
for job_id in ${job_ids[@]}
do
     URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
     echo "$URL"
     curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
     echo "\n"
done
Guss
  • 30,470
  • 17
  • 104
  • 128
David Archer
  • 2,121
  • 1
  • 15
  • 26
  • 1
    This console command worked for me: `copy([...$('a[title="Download artifacts"],a[download]').map((x, e) => /\/([0-9]+)\//.exec(e.href)[1])].join(' '))`. – joelnet Sep 05 '18 at 23:28
  • 3
    This also deletes the job log according to the [documentation](https://docs.gitlab.com/ee/api/jobs.html#erase-a-job). Since GitLab 11.9 there is a separate [API endpoint for deleting only the artifacts](https://docs.gitlab.com/ee/api/jobs.html#delete-artifacts), maybe you would like to update your answer? – Philipp Wendler Jul 19 '19 at 17:29
  • 2
    Note that this will delete the jobs along with the artifacts. See [my answer](https://stackoverflow.com/a/61551589/7370354) for a way to keep the jobs and delete only the artifacts. – Kartik Soneji May 01 '20 at 22:13
  • The provided js commandline script does not work. – domfz Sep 19 '22 at 12:27
16

Building on top of @David 's answer, @Philipp pointed out that there is now an api endpoint to delete only the job artifacts instead of the entire job.

You can run this script directly in the browser's Dev Tools console, or use node-fetch to run in node.js.

//Go to: https://gitlab.com/profile/personal_access_tokens
const API_KEY = "API_KEY";

//You can find project id inside the "General project settings" tab
const PROJECT_ID = 12345678;
const PROJECT_URL = "https://gitlab.com/api/v4/projects/" + PROJECT_ID + "/"

let jobs = [];
for(let i = 0, currentJobs = []; i == 0 || currentJobs.length > 0; i++){
    currentJobs = await sendApiRequest(
        PROJECT_URL + "jobs/?per_page=100&page=" + (i + 1)
    ).then(e => e.json());
    jobs = jobs.concat(currentJobs);
}

//skip jobs without artifacts
jobs = jobs.filter(e => e.artifacts);

//keep the latest build.
jobs.shift();

for(let job of jobs)
    await sendApiRequest(
        PROJECT_URL + "jobs/" + job.id + "/artifacts",
        {method: "DELETE"}
    );

async function sendApiRequest(url, options = {}){
    if(!options.headers)
        options.headers = {};
    options.headers["PRIVATE-TOKEN"] = API_KEY;

    return fetch(url, options);
}
Per Lundberg
  • 3,837
  • 1
  • 36
  • 46
Kartik Soneji
  • 1,066
  • 1
  • 13
  • 25
  • I get the following error when trying to use yours above in Chrome console Uncaught SyntaxError: Unexpected end of JSON input at :28:44 at async :21 (anonymous) @ VM62:28 – Stanford Sep 01 '20 at 19:23
  • 1
    @Stanford I fixed the script, can you try again? – Kartik Soneji Sep 01 '20 at 22:34
10

According to the documentation, deleting the entire job log (click on the trash can) will also delete the artifacts.

Cephalopod
  • 14,632
  • 7
  • 51
  • 70
5

I am on GitLab 8.17 and am able to remove artifacts for particular job by navigating to storage directory on server itself, default path is:

/var/opt/gitlab/gitlab-rails/shared/artifacts/<year_month>/<project_id?>/<jobid>

Removing both whole folder for job or simply contents, disappears artifact view from GitLab pipline page.

The storage path can be changed as described in docs:
https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/administration/job_artifacts.md#storing-job-artifacts

Kartik Soneji
  • 1,066
  • 1
  • 13
  • 25
PapaSmurf
  • 59
  • 1
  • 1
    Unfortunately, I have no direct access to the server as the repository is hosted at gitlab.com. I updated my question to point that out in more detail. – user1251007 Mar 01 '17 at 13:20
  • 6
    According to GitLab, you'll want to delete through the gitlab-rails console if you have access to the GitLab server itself. Otherwise, you may see discrepencies when looking at project size in the admin UI because the underlying database isn't updated. Ref: https://gitlab.com/gitlab-org/gitlab-ce/issues/5572#note_3359570 – David Archer Feb 26 '18 at 09:54
2

If you have deleted all the jobs by accident (thinking the artifacts would be gone, but they didn't) what would be the alternative then brute-forcing a loop range?

I have this code, which does bruteforce on a range of numbers. But since I use the gitlab.com public runners, It's a long-range

    # project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="xxxxxx" #

# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="yyyyy"
server="gitlab.com"


# Get a range of the oldest known job and the lastet known one, then bruteforce. Used in the case when you deleted pipelines and can't retrive Job Ids.

# https://stackoverflow.com/questions/52609966/for-loop-over-sequence-of-large-numbers-in-bash
for (( job_id = 59216999; job_id <= 190239535; job_id++ )) do
echo "$job_id"

echo Job ID being deleted is "$job_id"

curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
echo -en '\n'
echo -en '\n'
done
Daniel Vianna
  • 581
  • 6
  • 23
1

This Python solution worked for me with GitLab 13.11.3.

#!/bin/python3
# delete_artifacts.py  

import json
import requests

# adapt accordingly
base_url='https://gitlab.example.com'
project_id='1234'
access_token='123412341234'

#
# Get Version Tested with Version 13.11.3
# cf. https://docs.gitlab.com/ee/api/version.html#version-api
#
print(f'GET /version')
x= (requests.get(f"{base_url}/api/v4/version", headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)
print(f'Using GitLab version {data["version"]}. Tested with 13.11.3')

#
# List project jobs
# cf. https://docs.gitlab.com/ee/api/jobs.html#list-project-jobs
#
request_str=f'projects/{project_id}/jobs'
url=f'{base_url}/api/v4/{request_str}'
print(f'GET /{request_str}')
x= (requests.get(url, headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)

input('WARNING: This will delete all artifacts. Job logs will remain be available. Press Enter to continue...' )

#
# Delete job artifacts
# cf. https://docs.gitlab.com/ee/api/job_artifacts.html#delete-artifacts
#
for entry in data:
    request_str=f'projects/{project_id}/jobs/{entry["id"]}/artifacts'
    url=f'{base_url}/api/v4/{request_str}'
    print(f'DELETE /{request_str}')
    x = requests.delete(url, headers = {"PRIVATE-TOKEN": access_token })
    print(x)

I'll keep an updated version here. Feel free to reach out and improve the code.

el_tenedor
  • 644
  • 1
  • 8
  • 19