2

I am following the docs here https://docs.github.com/en/rest/actions/artifacts#download-an-artifact to use Github actions rest API to download artifacts. Given an ARTIFACT_ID and access token if the repo is private, one can call the API via cURL or the github CLI to get a response from github. The response header contains Location:... which provides a temporary URL lasting 1 minute from which the artifact can be downloaded. The artifact can then be downloaded via a second call to cURL.

I would like to know the reason for this design decision on the part of Github. In particular, why not just return the artifact in response to the first call to cURL? Additionally, given that the first call to cURL is intended to return a temporary URL from which the artifact can be retrieved, why not have this temporary URL returned directly by call to cURL rather than having it only contained in the header. Other information such as if the credentials are bad, or if the object has been moved are returned in json when this cURL command is run, so why can't the temporary URL also be contained here?

To help clarify my question, here is some relevant code:

# The initial cURL command looks something like this:
curl -v \
  -H "Accept: application/vnd.github+json" \ 
  -H "Authorization: token <TOKEN>" \
  https://api.github.com/repos/OWNER/REPO/actions/artifacts/ARTIFACT_ID/ARCHIVE_FORMAT

# the temporary URL, which can be curled to retrieve the artifact, looks like something like this:
curl https://pipelines/actions/githubusercontent.com/serviceHosts/<HEXSTRING>/_apis/pipelines/1/runs/16/\
  signedartifactscontent?artifactName=<artName>&urlExpires=<date>&urlSigningMethod=HMACV2&urlSignature=<SIGNATURE>

Additionally, I am currently capturing the standard error of the cURL command and then running regex on it so as to extract the temporary URL. Is there a better way to do this? For example, is there a flag I could pass to cURL that would give me the value of Location directly?

Additionally, it is stated that The archive_format must be zip. Given this is the case, what is the benefit of having this parameter. Is it not redundant? If so, what is the benefit of this redundency?

Mathew
  • 1,116
  • 5
  • 27
  • 59

2 Answers2

2

This is a consequence of a 2011 design decision regarding https://github.blog/2011-08-02-nodeload2-downloads-reloaded/

When implementing a proxy of any kind, you have to deal with clients that can’t read content as fast as you can send it.

When an HTTP server response stream can’t send any more data to you, write() returns false.
Then, you can pause the proxied HTTP request stream, until the server response emits a drain event.
The drain event means it’s ready to send more data, and that you can now resume the proxied HTTP request stream.

TO avoid DDOS, it is better to manage that stream from a temporary URL, rather than a fixed one.

You can use -D to display response header, but you would still need to post-process its answer to get the redirection URL.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
1

VonC's answer covers why GitHub has implemented this using a temporary URL, but here's answers to your other subquestions:

why not have this temporary URL returned directly by call to cURL rather than having it only contained in the header

The GitHub API is following how HTTP redirections are expected to work. From the MDN web docs:

In HTTP, redirection is triggered by a server sending a special redirect response to a request. Redirect responses have status codes that start with 3, and a Location header holding the URL to redirect to.

The benefit of this is that clients e.g. web browsers, or even curl, then understand how to handle this redirection.

So to answer your other subquestion:

is there a flag I could pass to cURL that would give me the value of Location directly?

Yes there is. The --location or -L flags will tell curl to read the Location value for the header, and do the second request for you. From the man page:

If the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place.

xlm
  • 6,854
  • 14
  • 53
  • 55