2

I have a git repository for my code, but also want to include some Word files and PDFs in a subdirectory, and may possibly want to add some binaries. I don't want to store the deltas, just the latest version of these files. Is there a way to do that in git?

user994165
  • 9,146
  • 30
  • 98
  • 165
  • *I don't want to store the deltas.* Good, because Git doesn't store the "deltas". You probably want to use `git update-index --assume-unchanged `; see http://stackoverflow.com/questions/653454/how-do-you-make-git-ignore-files-without-using-gitignore?lq=1 – jub0bs Jan 31 '15 at 15:44
  • My advice, just put the binaries in the source tree and do as usual, let git handle their "binary aspect" –  Jan 31 '15 at 15:47
  • @Jubobs, I don't want to ignore local changes, I just want git to only store the latest copy, to save space. – user994165 Jan 31 '15 at 16:21
  • 1
    @user994165 You cannot do that with Git. Commits, once created, are set in stone. If you record a version of your Word file in a commit, that version will be in your repo (and therefore, take some space) for as long as the commit in question remains in your repo. – jub0bs Jan 31 '15 at 16:49
  • @Jubobs, what about after having pushed all your changes, is there a way to delete the history for just one file or files in a directory? – user994165 Jan 31 '15 at 17:03
  • @user994165 The short answer is "no". Anything else would go against Git's spirit. – jub0bs Jan 31 '15 at 17:24
  • possible duplicate of [Managing large binary files with git](http://stackoverflow.com/questions/540535/managing-large-binary-files-with-git) – Schwern Jan 31 '15 at 19:42
  • @Jubobs *Conceptually*, Git does not store deltas. In reality, [Git often stores deltas](https://stackoverflow.com/questions/5176225/are-gits-pack-files-deltas-rather-than-snapshots). The history of one file can be deleted (more accurately, you can create a new history where the file does not exist) with git-filter-branch, git-rebase and similar tools. – Schwern Jan 31 '15 at 19:50
  • @Schwern Sure, if you want to delve into packs. I was referring to the datastore architecture (DAGs, etc.), not to the compression side of things. – jub0bs Jan 31 '15 at 19:53

1 Answers1

5

There's a few ways you can do what you want. Here they are from most to least pleasant.

If they're small, or don't change often, don't worry about it. If you don't compress them (remember PDFs are often compressed), Git can still take the deltas of binary files (I think, you might have to trick it into thinking its a text file).

Update If they're large, use Git Large File Storage (git-lfs) which stores their history, but keeps the bulky historical content in the cloud. You only need to download the version you've checked out. This lets you store large files while keeping the repository slim.

If they start small and get large use the BFG Repo Cleaner to retroactively store their history in git-lfs.

If they're large or will change frequently, probably best option is to not store the files in Git. Instead, download them as part of your build process. You don't want their history, you just want their latest version.

Another is to use a tool like git-annex, as recommended in this answer.

git-annex allows managing files with git, without checking the file contents into git. While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, time, or disk space.

Another is to store them in another Git repository and link it to yours using submodules or subtrees. Submodules can be told to make a shallow copy of the sub-repository using the --depth flag. This lets you keep the history of those big files and keep your development repository's history small. Unfortunately, both techniques have their caveats.

Finally, you can periodically cull the file from history with git-filter-branch or BFG. Not only would this be a manual process, but because Git cannot change history it creates new history, it will rewrite all the following commits and cause general chaos when pushing and pulling.

Much of this is covered in this question.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thanks. I was probably just going to keep these files in a separate repository and when I finish making my changes, delete the remote and create a new one. I'm pretty sure that will trash all the history. This is in gitorious. – user994165 Feb 13 '15 at 18:49