10

I put several .docx, .txt and .pdf file into a .git repository. I can open, edit, save the local .docx file; however, when I push it to github, and download it back to my computer, Word complains that it cannot open it.

In order to store .docx file on github, is there some essential steps I should do to the git settings?

Nick Volynkin
  • 14,023
  • 6
  • 43
  • 67
Nick
  • 8,451
  • 13
  • 57
  • 106
  • I don't think that this has anything to do with git, but rather with github. If you clone the repository in an other folder, can you open your docx files ? – edi9999 Jun 09 '15 at 10:48
  • Don't store binary data (as `.docx`, `.pdf`) in git. Git is for plain text such as source code, not for binary data! – musicmatze Jun 09 '15 at 11:06
  • 1
    @edi9999 Ah, I downloaded the `.docx` file using the `Chrome` browser. After reading your comment, I suspect that maybe it's the broswer's problem. I then `git clone` it to another folder: Word can open it. So it's `Chrome` which cannot play well with github. Thank you very much! – Nick Jun 09 '15 at 11:18
  • 3
    @musicmatze No, it's not. Git can store any binary data if it's set up properly. – Nick Volynkin Jun 09 '15 at 12:06
  • @NickVolynkin okay, I should have said "git was _meant_ for text"... Of course it can store binary data as well... it just does not make that much sense. – musicmatze Jun 09 '15 at 12:07
  • @Nick Chrome should not have problems with Git or GitHub. Maybe it was the endline issue too. – Nick Volynkin Jun 09 '15 at 13:09
  • you could also try https://github.com/Nicola17/ODT-Git-helper - also there's some tips over at https://web.archive.org/web/20180107081412/https://git.wiki.kernel.org/index.php/GitTips#How_to_use_git_to_track_OpenDocument_.28OpenOffice.2C_Koffice.29_files.3F – Ben Creasy Mar 12 '18 at 00:01

2 Answers2

21

Solution

Make a .gitattributes file in your working directory and add the following line to it:

*.docx    binary

Why not just set core.autocrlf=false ?

This is useful too. But configuring .docx as a binary format solves not only this problem, but also potential merge issues.

What is the origin of this problem?

From http://git-scm.com/docs/gitattributes , section "Marking files as binary". Note the italicized section.

Git usually guesses correctly whether a blob contains text or binary data by examining the beginning of the contents. However, sometimes you may want to override its decision, either because a blob contains binary data later in the file, or because the content, while technically composed of text characters, is opaque to a human reader.

.docx format is a zip folder containting xml and binary data, such as images.

Git treated your .docx as a text (and not binary) file and replaced endline characters. As Microsoft-developed format, .docx is probably using CRLF, which might have been replaced with LF in the remote repository. When you downloaded that file directly from remote, it still had LFs.

In a binary file Git never replaces endline chars, so even the files on remote repository will have proper CRLFs.

Applicable formats

This is applicable to any file format which is a zipped package with text and binary data. This includes:

cengel
  • 272
  • 8
  • 19
Nick Volynkin
  • 14,023
  • 6
  • 43
  • 67
  • Thank you. I've added that line in the `.gitattributes` file. But I don't understand that, if the problem is caused by endline characters, why word can open `.docx` file from `git clone` to local folder; while could not open chrome donwnloaded version? They are the same file. Did `git clone` adds `CRLF` back? – Nick Jun 09 '15 at 13:25
  • 1
    @Nick configuring `.docx` as a binary format solves not only this problem, but also potential merge issues. – Nick Volynkin Jun 09 '15 at 13:30
  • You're right! I tried to download the file from github by first click on the file link, and then *right click* on `View Raw`, `save as...` This time, word could open it. Thank you very much! – Nick Jun 09 '15 at 13:40
  • @Nick great! Updated my answer according to your approval. – Nick Volynkin Jun 09 '15 at 16:28
0

The problem here is most likely a GitHub web interface quirk whereby your attempt to download the file actually produced an HTML page about the file. Using clone will, as you found, work fine.