78

git-annex has been around for quite some time, but never really gained momentum.
Git LFS is rather young and is already supported by GitHub, Bitbucket and GitLab.

Both tools handle binary files in git repositories. On the other hand, GitLab seems to have replaced git-annex with Git LFS within one year.

  • What are the technical differences?
  • Do they solve the same problem?
Stefanus
  • 1,619
  • 3
  • 12
  • 23
  • 1
    Here's quite nice article about the two: [Large files with Git: LFS and git-annex](https://lwn.net/Articles/774125/) (LWN.net) – Niko Föhr Feb 09 '23 at 16:33

2 Answers2

78

They do solve the same problem.

Let me start off with pro/con, then I'll move into technical differences.

git-annex

Pros:

  • Supports multiple remotes that you can store the binaries.
  • Can be used without support from hosting provider (for more details see here).

Cons:

  • Windows support in beta, and has been for a long time
  • Users need to learn separate commands for day-to-day work
  • not supported by github and bitbucket

git-lfs

Pros:

  • Supported by github, bitbucket and gitlab
  • Most supported on all os's
  • Easy to use.
  • automated based on filters

Cons:

Technical

git-annex

git-annex works by creating a symlink in your repo that gets committed. The actual data gets stored into a separate backend (S3, rsync, and MANY others). It is written in haskell. Since it uses symlinks, windows users are forced to use annex in a much different manner, which makes the learning curve higher.

git-lfs

Pointer files are written. A git-lfs api is used to write the BLOBs to lfs. A special LFS server is required due to this. Git lfs uses filters so you only have to set up lfs once, and again when you want to specify which types of files you want to push to lfs.

Community
  • 1
  • 1
grepsedawk
  • 3,324
  • 2
  • 26
  • 49
  • Great summary! I have two more questions. Do Windows users of git-annex lose some of the functionality of git-annex? Can there be several LFS servers (comparable to multiple backends in git-annex)? – Stefanus Sep 06 '16 at 17:47
  • 1
    There could be, LFS works a LOT like the actual git servers work. You would simply add another remote and push the branch to both remotes. – grepsedawk Sep 07 '16 at 18:28
  • “automated based on filters” It appears git annex can do the same too: https://git-annex.branchable.com/tips/largefiles/ – Rufflewind Jan 20 '17 at 04:59
  • I welcome any modification to this list if anybody would like to add pros/cons. – grepsedawk Oct 20 '17 at 22:05
  • 1
    Was all set to use git-lfs (as I use github to host my repos currently) and then found out the pricing structures for those providers is different for LFS repos. Would probably need to pay at least $5-10 p/month for a repository with any file you could deem large in it (although might be able to do something with GitLab's free 10GB). Not a deal breaker for industry users but typically not suitable for research software that is meant to be published indefinitely. – Tom Close Sep 23 '19 at 05:26
  • @TomClose: if it is research software, you better get your library to help you for long term storage. No other institution can give you a 10 years guarantee you need. Zenodo is another option if your library cannot do much. – Julien Colomb Jan 23 '20 at 17:17
  • @JulienColomb Yes, either a university run Gitlab instance is probably the best option. However, I wasn't keen to move away from GitHub, and since the large files are only required for running tests I have ended up using git-annex with a special remote back to my uni's infrastructure – Tom Close Jan 28 '20 at 00:49
  • @TomClose sounds great, you can still use github-zenodo integration, although I do not know how it would work with git-annex. would love to have a look, do you have a link to the repo? – Julien Colomb Jan 29 '20 at 09:12
  • @JulienColomb The repo is http://github.com/MonashBI/banana, the git-annex integration is still a WIP though. The plan is to hook it up to CircleCI and get a container there to pull from my special remote (at this stage just my uni GDrive but once I get it working I plan to move it to dedicated research infrastructure) before running the tests – Tom Close Jan 29 '20 at 23:50
51

A major advantage of git annex is that you can choose which file you want to download.

You still know which files are available thanks to the symlinks.

For example suppose that you have a directory full of ISO files. You can list the files, then decide which one you want to download by typing: git annex get my_file.

Another advantage is that the files are not duplicated in your checkout. With LFS, lfs files are present as git objects both in .git/lfs/objects and in your working repository. So If you have 20 GB of LFS files, you need 40 GB on your disk. While with git annex, files are symlinked so in this case only 20 GB is required.

Karl Forner
  • 4,175
  • 25
  • 32
  • 1
    Thanks for this answer! I'm still trying to understand things, but wouldn't GVFS (https://github.com/Microsoft/gvfs) paired with git/git-lfs address the issue of downloading individual files. From their readme... "GVFS virtualizes the file system beneath your git repo so that git and all tools see what appears to be a normal repo, but GVFS only downloads objects as they are needed" – Vivek Gani Jun 11 '17 at 19:58
  • 2
    GVFS seems to be windows only for the moment. – Karl Forner Jun 12 '17 at 08:18
  • 16
    Thanks for mentioning the data duplication of LFS, I didn't see that mentioned anywhere else. I'd rather not duplicate my media directory disk usage for no good reason. – Ragnar Jul 29 '17 at 16:58
  • I found this article (https://writequit.org/articles/getting-started-with-git-annex.html) by Lee Hinman useful in understanding Karl Forner's answer because it clearly separates two workflows that you can use with git-annex: 1) tracking file _metadata_ without moving large files, and 2) moving and copying large files. – John Dec 06 '20 at 20:58