Why does actions/checkout fetch the latest commit right after cloning the repository?

Question

While trying to make sense of the action actions/checkout@version, I came across this answer in which the following was stated to explain what happens when the same action is executed:

The default steps being executed are:

The current repo in which the workflow is being triggered gets cloned.

Depending on the defined events such as a push or pull request:
For a push event, it runs the command below, where $GITHUB_REF points to the latest commit on the specified branch for the push event in the workflow:
git fetch --depth 1 $GITHUB_REF

What I don't understand pertains to the reason the latest commit is fetched after the repository have just been cloned and thus is already up to date.

score 2 · Answer 1 · answered Jun 23 '23 at 03:57

2 sample reasons to run this command:

$GITHUB_REF may be something other than a ref or tag ; for example, github manages references named refs/pull/<id>/head to point at the latest commit of a pull request, and these references would not be downloaded by a default git clone
depending on how the repo is cloned (e.g : if it is a shallow clone) it may not have all branches to start with.

Running git fetch ... $GITHUB_REF first is a way to make sure the action works in all possible conditions.

That ref may also come from a fork in case of a pull request workflow. Thus the clone would be the target repo and the ref in question would come from the source. — jessehouwing, Jun 23 '23 at 10:10

score 1 · Answer 2 · answered Jun 23 '23 at 00:42

While it may appear to suffice to "only" clone the git repository, you have to keep in mind that git is distributed version control.

So git-clone(1) will take care to clone the actual repository (or fail with that operation), but if you think twice, the contents of the repository may have already changed while cloning it (just to give one example).

Really think about distributed computer systems here. Networks involved (yes, transports do fail). Time involved (yes, time runs backwards).

To go on, let's assume the clone operation did not fail and succeeded, we're on the happy path here.

Now next job is to fetch the concrete revision that is actionable. Well, let's fetch it. Because git-fetch(1) will tell us if it worked or not.

So...

... while git-clone(1) suffices to verify the remote repository is accessible and could be cloned ...

... git-fetch(1) is the correct operation to obtain ("fetch") the specific revision that is of interest (in context of the action).

Both operations need to be done. The later, logically more important one (work on that revision), can only be done if the repository is already cloned (bootstrapped, initialized).

I hope this answers your question and does not sound making fun of your question, because this is easy to miss. I often just clone a repository and I don't need to take care of fetching the correct revision because it is already there. Git with git-clone(1) is clever enough to already understand what I'm looking for (e.g. a branch name) and then it's HEAD that is what I meant.

However, when we automate a build, we want to specifically ensure (assert) we can and do build a very distinct revision. GITHUB_REF, CI_COMMIT, BUILD_REVISION, REVISION -- there are many names depending on which (proprietary) platform you run it.

At the end of the day this is a reference to a commit in git identified by the SHA1 hash, the revision. Any of those parameters is just referencing it, and git-fetch(1) ensures it fetches it (and with it all necessary files/trees/objects). This is what we use a version control system for. And it certainly requires to fetch this revision first.

All those branch or tag names do just evaporate over time, they are symbolic, given the child a name to ship it but then can be overwritten any time. Fine for clone, but you don't want to build on that. Fetch the real revision first.

Which then again requires to setup the remote and clone it. (this is done in one step with git-clone(1) for all the bootstrapping.)

Happy coding and let the build run along!

Exercise:

Instead of action/checkout[@<revision>] you can just run: git clone and git fetch, git checkout ... . The earlier is a javascript wrapper for these exact shell commands you were asking for (but obviously their documentation misses to share the rationale of the implementation otherwise you would not have asked for it, right?).

So perhaps if you like to experiment, clone the git repository via run: in the bash shell and explore the environment how git is actually cloned within the Microsoft infrastructure Github runs on (and therefore your workflow).

For that learn how to use the gh command-line interface and create one (temporary) repository (after the other) to trigger action runs and then remove the temporary repository/ies after you have reviewed the outcome.

score 1 · Accepted Answer · answered Jun 23 '23 at 10:01

The first step mentioned in your quote is incorrect (Emphasis mine):

The current repo in which the workflow is being triggered gets cloned.

Having looked at the code ^[GitHub] for the action here are the steps it performs to checkout the reference:

If git is not present download the code using the API and return
Otherwise if git is present initialize an empty git repository
Add a remote for that empty repository pointing to the repository to be used
Fetch & checkout the specific ref needed

You can note here that the repository isn't actually cloned and rather than that a remote is configured on the newly initialized empty repository. Hence running fetch is really needed since it doesn't yet actually have the commit.

Note: I've skipped some of the other steps that the code performs that aren't relevant to the question.

Why does actions/checkout fetch the latest commit right after cloning the repository?

3 Answers3