16

Despite having used git for years, I find git lfs (git Large File Storage) to be pretty confusing to use, even at a very basic level. Can someone explain the difference between these 3 commands?:

  1. git lfs fetch
  2. git lfs fetch --all
  3. git lfs pull

Related:

  1. Pull ALL files from git LFS
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265

1 Answers1

27

Update 5 May 2023: to anyone thinking of using git lfs, don't! See my explanation in my question here, in this section: Update: don't use git lfs. I now recommend against using git lfs, and in my answer here.

For personal, free GitHub accounts, it is way too limiting, and for paid, corporate accounts, it makes git checkout go from taking a few seconds to up to 3+ hours, especially for remote workers, which is a total waste of their time. I dealt with that for three years and it was horrible. I wrote a script to do a git lfs fetch once per night to mitigate this, but my employer refused to buy me a bigger SSD to give me enough space to do git lfs fetch --all once per night, so I still ran into the multi-hour-checkout problem frequently. It's also impossible to undo the integration of git lfs into your repo unless you delete your whole GitHub repo and recreate it from scratch.

In both cases: corporate and free, with over 3 years of daily experience using it, I have found git lfs to be a massive time-waster.

If you are forced to use git lfs by your employer, however, here's what you need to know:


Now on to the answer:

After a bunch of study and figuring out where the help pages are, here is what I have concluded:

How to use git lfs as a basic user

This covers the question: "What is the difference between git lfs fetch, git lfs fetch --all, git lfs pull, and git lfs checkout?"

Summary

# Fetch git lfs files for just the currently-checked-out branch or commit (Ex: 20
# GB of data). This downloads the files into your `.git/lfs` dir but does NOT
# update them in your working file system for the branch or commit you have 
# currently checked-out.
git lfs fetch

# Fetch git lfs files for ALL remote branches (Ex: 1000 GB of data), downloading
# all files into your `.git/lfs` directory.
git lfs fetch --all

# Fetch git lfs files for just these 3 branches (Ex: 60 GB of data)
# See `man git-lfs-fetch` for details. The example they give is:
# `git lfs fetch origin main mybranch e445b45c1c9c6282614f201b62778e4c0688b5c8`
git lfs fetch origin main mybranch1 mybranch2

# Check out, or "activate" the git lfs files for your currently-checked-out
# branch or commit, by updating all file placeholders or pointers in your
# active filesystem for the current branch with the actual files these git lfs
# placeholders point to.
git lfs checkout

# Fetch and check out in one step. This one command is the equivalent of these 2
# commands:
#       git lfs fetch
#       git lfs checkout
git lfs pull
#
# Note that `git lfs pull` is similar to how `git pull` is the equivalent
# of these 2 commands:
#       git fetch
#       git merge

So, a general, recommended workflow to check out your git files and your git lfs files might look like this:

git checkout main   # check out your `main` branch
git pull            # pull latest git files from the remote, for this branch
git lfs pull        # pull latest git lfs files from the remote, for this branch

# OR (exact same thing)
git checkout main   # check out your `main` branch
# (The next 2 commands replace `git pull`)
git fetch           # fetch the latest files from the remote for branch `main`
                        # into your locally-stored hidden remote-tracking branch
                        # named `origin/main`, for example
git merge           # merge the latest content (which you just fetched
                        # into your local hidden branch `origin/main`)
                        # into non-hidden branch `main`
# (The next 2 commands replace `git lfs pull`)
git lfs fetch       # fetch latest git lfs files from the remote, for this 
                        # branch
git lfs checkout    # check out all git lfs files for this branch, replacing 
                        # git lfs file placeholders with the actual files

Details

1. git lfs fetch

See man git-lfs-fetch, and git lfs fetch --help.

From git lfs fetch --help (emphasis added):

Download Git LFS objects at the given refs from the specified remote. See "Default remote" and "Default refs" for what happens if you don't specify.

This does not update the working copy.

So, this is just like doing git fetch (where it fetches remote contents to your locally-stored, remote-tracking hidden branches), except it is for git lfs-controlled files.

It fetches the git lfs file content to your .git/lfs directory I believe, but does NOT update your active file system (the currently checked-out branch) with those files.

From farther down in the help menu (emphasis added):

Default remote

Without arguments, fetch downloads from the default remote. The default remote is the same as for git fetch, i.e. based on the remote branch you're tracking first, or origin otherwise.

Default refs

If no refs are given as arguments, the currently checked out ref is used. In addition, if enabled, recently changed refs and commits are also included. See "Recent changes" for details.

Note that the "currently checked-out ref" refers to your currently-checked out branch or commit.

2. git lfs fetch --all

Whereas git lfs fetch fetches only the content for your currently-checked-out branch or commit, by default, git lfs fetch --all checks out ALL content for ALL remote branches. On a gigantic corporate mono-repo, that means that git lfs fetch might fetch 20 GB of data, whereas git lfs fetch --all might fetch 1000 GB of data. In such a case, do NOT include --all unless:

  1. you absolutely have to, OR
  2. the amount of data being fetched is still reasonably small

From git lfs fetch --help (emphasis added):

* --all:

Download all objects that are referenced by any commit reachable from the refs provided as arguments. If no refs are provided, then all refs are fetched. This is primarily for backup and migration purposes. Cannot be combined with --recent or --include/--exclude. Ignores any globally configured include and exclude paths to ensure that all objects are downloaded.

3. git lfs pull

Just like git pull is the combination of git fetch and git merge, git lfs pull is the combination of git lfs fetch and git lfs checkout.

From git lfs pull --help (emphasis added):

git lfs pull [options] [<remote>]

Download Git LFS objects for the currently checked out ref, and update the working copy with the downloaded content if required.

This is equivalent to running the following 2 commands:

git lfs fetch [options] [<remote>]
git lfs checkout

So, that begs the question: "what does git lfs checkout do?":

4. git lfs checkout

This command copies the git lfs files from your .git/lfs directory to your active, working tree for the current reference (branch or commit) you have currently checked-out.

From git lfs checkout --help:

Try to ensure that the working copy contains file content for Git LFS objects for the current ref, if the object data is available. Does not download any content; see git lfs fetch for that.

Checkout scans the current ref for all LFS objects that would be required, then where a file is either missing in the working copy, or contains placeholder pointer content with the same SHA, the real file content is written, provided we have it in the local store. Modified files are never overwritten.

One or more <glob-pattern>s may be provided as arguments to restrict the set of files that are updated. Glob patterns are matched as per the format described in gitignore(5).

And it provides some examples. Ex:

Examples

  • Checkout all files that are missing or placeholders:

    $ git lfs checkout
    
  • Checkout a specific couple of files:

    $ git lfs checkout path/to/file1.png path/to.file2.png
    

Related

  1. My explanation in my question here, in this section: Update: don't use git lfs. I now recommend against using git lfs in our free GitHub repos
  2. My answer: Unix & Linux: All about finding, filtering, and sorting with find, based on file size - see the example near the end, titled "(Figure out which file extensions to add to git lfs next)".
  3. Other really useful git lfs info:
    1. Great article!: my developer planet: Git LFS: Why and how to use
    2. https://git-lfs.github.com/
    3. My repo and notes: https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles#how-to-clone-this-repo-and-all-git-submodules
    4. Very useful video!: What is Git LFS?: https://www.youtube.com/watch?v=9gaTargV5BY. I discovered this video from here: https://stackoverflow.com/a/49173061/4561887
  4. https://www.git-tower.com/learn/git/faq/difference-between-git-fetch-git-pull
  5. My answer to Can I "undo" `git lfs checkout?
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • Thanks for the research and write-up. A couple of clarification questions: 1) Does `git lfs fetch` pull down LFS-tracked files just for the current commit, or for all the commits in history? (I hope it is the former.) 2) Once you run `git lfs checkout`, how do you "uncheckout" the files, i.e. go back to using placeholder files rather than the actual files in the working tree? – Garret Wilson Oct 19 '22 at 14:54
  • @GarretWilson, as shown in the code comments in my summary section, `git lfs fetch` fetches only "files for just the currently-checked-out branch or commit", whereas `git lfs fetch --all` fetches "git lfs files for ALL remote branches". As for how to replace files with placeholder links again, I don't know. – Gabriel Staples Oct 25 '22 at 06:01
  • @GarretWilson, you can't uncheckout yet. See [my answer here](https://stackoverflow.com/a/74189717/4561887) – Gabriel Staples Oct 25 '22 at 06:11
  • 2
    @GabrielStaples your note at the top about not using git lfs is very misleading. You are actually advising against using git lfs _on free GitHub accounts_ . There is nothing wrong with git lfs itself. – Andy Madge May 12 '23 at 12:54
  • @AndyMadge, you are right. It is misleading, but not in the way you think. I mean to say to not use it for work _or_ home, paid _or_ free. It can make `git checkout` take _hours_ instead of seconds. I've updated my explanation to make this clear. Perhaps this could have also been mitigated (but not eliminated) by having a bigger ssd, [as I now suggest at the bottom of my question here](https://stackoverflow.com/q/75946411/4561887), and configuring a script which does `git lfs fetch --all` nightly. – Gabriel Staples May 12 '23 at 17:58
  • @AndyMadge, in both cases: corporate and free, with over 3 years of daily experience using it, I have found `git lfs` to be a massive time-waster. – Gabriel Staples May 12 '23 at 18:02