29

I've been learning about the GIT version control system recently and It seems to work very well for plain text documents as you can add a single line, go back and fork it, revise the document and remove the line you just added.

I program mostly in excel and write documentation in word. Can GIT be used to manage versions of these files (obviously not the stuff inside the files, but the files themselves?)

yoshiserry
  • 20,175
  • 35
  • 77
  • 104

1 Answers1

35

Git is agnostic in the sense that it doesn't matter which files you put under version control.

When git can't recognise the type of a file it just treats it as binary data for versioning purposes; so diff etc. will just state that the files are different by a number of bytes.

So to answer your question: Yes, Git can be used to manage versions of word documents etc., furthermore using gitattributes you can even diff word documents although I'm not sure if this is possible for xls files.

To learn more about the possibilities which gitattributes provide and to see some examples for diffing word files you can read the Customizing Git - Git Attributes chapter from the Gitpro book.

Sascha Wolf
  • 18,810
  • 4
  • 51
  • 73
  • 1
    but, the question I think everyone has it, when a new line is added to the word document does it add the line to the doc(i.e. track the line) or does it copy the entire new document. Obviously git couldn't understand every file format, right? – juztcode Aug 06 '20 at 18:09
  • @juztcode git will "copy" the entire new document (create a snapshot) __but__ that is true for __every kind of file__, even simple text files ([see this question for details](https://stackoverflow.com/questions/8198105/how-does-git-store-files)). Deltas (speak the diff between two or more versions of a file) are only used to speed up common git operations (such as checkout). – Sascha Wolf Aug 07 '20 at 11:18
  • 1
    I thought in code file, like `.cpp` it only added or deleted newer lines that were changed. e.g. I added a code `int a = 3` on line 3, it didn't copy the entire file and only stored this as a new change. This could be possible with code files but not with another types – juztcode Aug 08 '20 at 08:14
  • 3
    I understand why you assumed that but it's not how git operates. git __always__ stores full snapshots, not diffs, at least on a conceptual level. git might then do some smart optimizations on top to avoid having to store thousands of versions but at the end of the day snapshots are the __source of truth__. [This answer from the question I linked above goes into great depths on how exactly git does that](https://stackoverflow.com/a/8198276/2274224). – Sascha Wolf Aug 14 '20 at 12:02
  • Because git sees these files as binary, you will not, however, be able to merge, cherry pick or revert specific commits. It can totally be used to store versions though, and simplifies backuping an important folder, you can make a remote repo on a different drive, or even in a private repo on the cloud, and push changes periodically. Not only do you get a backup, you can even revert to older versions! – Mathieu Turcotte Dec 09 '22 at 13:07
  • @SaschaWolf actually I followed your link but VonC's reply there said the contrary. "Git does use diff for storage." I guess most people understand conceptually version control give you snapshots but the way GIT deliver that seems to use diff. – ACCL Dec 11 '22 at 10:20
  • @ACCL I'm sorry but that's simply not correct. The very first paragraph of the linked answer says this. "Git does include for each commit a full copy of all the files, except that, for the content already present in the Git repo, the snapshot will simply point to said content rather than duplicate it." Take a look at the [Git Internals - Git Objects](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) section of the Git Book for further details. – Sascha Wolf Jan 11 '23 at 10:29