205

I wonder why git tells me this?

$ git diff MyFile.txt
diff --git a/MyFile.txt b/MyFile.txt
index d41a4f3..15dcfa2 100644
Binary files a/MyFile.txt and b/MyFile.txt differ

Aren't they text files?

I have checked the .gitattributes and it is empty. Why I am getting this message ?, I cannot get diffs as I use to anymore

ADDED :

I've noticed there is an @ in the file permissions, what is this ?, Could this be the reason ?

$ls -all
drwxr-xr-x   5 nacho4d  staff    170 28 Jul 17:07 .
drwxr-xr-x  16 nacho4d  staff    544 28 Jul 16:39 ..
-rw-r--r--@  1 nacho4d  staff   6148 28 Jul 16:15 .DS_Store
-rw-r--r--@  1 nacho4d  staff    746 28 Jul 17:07 MyFile.txt
-rw-r--r--   1 nacho4d  staff  22538  5 Apr 16:18 OtherFile.txt
dibi
  • 3,257
  • 4
  • 24
  • 31
nacho4d
  • 43,720
  • 45
  • 157
  • 240
  • 5
    It could be a UTF-8 encoded file. – Marnix van Valen Jul 28 '11 at 08:04
  • 1
    It is supposed to be UTF16 little endian LF – nacho4d Jul 28 '11 at 08:07
  • 2
    From the `ls` manpage on Mac OS X: *If the file or directory has extended attributes, the permissions field printed by the `-l` option is followed by a `@` character*. Use option `-@` to see these extended attributes. – adl Jul 28 '11 at 08:17
  • I think this could be a bug of git. I deleted the extended attributes and now everything is fine again. – nacho4d Jul 28 '11 at 08:45
  • @Marnix: UTF-8 encoded file will be detected as text. UTF-16 encoded file on the other hand is binary. – Jan Hudec Jul 28 '11 at 09:12
  • 6
    @nacho4d: That's strange, because git shouldn't even know that there are any extended attributes. If you could reproduce it, it would be worth bringing up on the git mailing list. As is good custom on `vger.kernel.org` lists, you do not have to subscribe to post (people will keep you CC'ed for answers) and are kind of supposed not to given the rather high volume of the `git@vger.kernel.org` list. – Jan Hudec Jul 28 '11 at 09:34
  • @nacho4d please post `hexdump MyFile.txt`. – Ciro Santilli OurBigBook.com Jun 21 '14 at 09:33
  • The general question for all version control engines: http://stackoverflow.com/questions/7110750/how-do-popular-source-control-systems-differentiate-binary-files-from-text-files/7112964#7112964 – Ciro Santilli OurBigBook.com Jun 21 '14 at 09:41
  • possible duplicate of [Why does git think my cs file is binary?](http://stackoverflow.com/questions/2506041/why-does-git-think-my-cs-file-is-binary) – Nick Grealy May 08 '15 at 01:50

17 Answers17

107

It simply means that when git inspects the actual content of the file (it doesn't know that any given extension is not a binary file - you can use the attributes file if you want to tell it explicitly - see the man pages).

Having inspected the file's contents it has seen stuff that isn't in basic ascii characters. Being UTF16 I expect that it will have 'funny' characters so it thinks it's binary.

There are ways of telling git if you have internationalisation (i18n) or extended character formats for the file. I'm not sufficiently up on the exact method for setting that - you may need to RT[Full]M ;-)

Edit: a quick search of SO found can-i-make-git-recognize-a-utf-16-file-as-text which should give you a few clues.

Community
  • 1
  • 1
Philip Oakley
  • 13,333
  • 9
  • 48
  • 71
  • 11
    You are almost but not completely not wrong. Git did have inspected the actual files and have seen 'funny' characters there. However it does not "think" UTF-16 is binary. It *is* binary, because text is defined as ASCII-based (that's the only thing the built-in diff will give usable results for) and UTF-16 is not. Yes, there is a way to tell git to use special diff for pattern defined files (using `.gitattributes`). – Jan Hudec Jul 28 '11 at 09:27
  • 2
    I should add, that 'funny characters' really means zero bytes. – Jan Hudec Jul 28 '11 at 09:31
  • 6
    We are both right, but from different perspectives. We both say "Git inspects the contents to determine its type." We both say that to make git know it should be treated as UTF16 the user needs to tell git via `.gitattributes` etc. – Philip Oakley Jul 28 '11 at 09:34
  • I said you are almost right. The only difference is that I disagree with saying it "thinks" UTF-16 is binary and insist that it "is" binary (because of what "text" means to git). – Jan Hudec Jul 28 '11 at 11:02
  • 8
    @JanHudec: In your view, ALL files are binary. – stolsvik Oct 19 '16 at 22:17
  • 1
    @stolsvik, no, in my view, "text is defined as ASCII-based". Because that's what it means in Git. – Jan Hudec Oct 20 '16 at 07:22
  • 4
    @stolosvik, (and JanH) It's a more subtle middle ground in that UTF-8 includes both the base 0-127 ASCII characters, and all other Unicode chars, without need of a null (00h) byte for anything other than the nul char (the 'C' string terminator). Thus Git's text definition is that the content (well the first 1k bytes) should not have a null byte when utf-8 encoded. Try http://stackoverflow.com/questions/2241348/what-is-unicode-utf-8-utf-16 for a fun read. My original comment refers to the case when UTF-16 encoded data is viewed as byte pairs, so the high byte for ascii code points will be 00. – Philip Oakley Oct 20 '16 at 12:30
  • 1
    Remember that the `.gitattributes` file itself has to be readable by `git` (so, no UTF-16 encoding and the like) – mewa Jun 05 '18 at 10:25
  • 1
    TIL: `echo '' > some-file.txt` creates a file with UCS-2 LE BOM encoding (Windows Powershell / Notepad++).... unless you [change the default setting](https://stackoverflow.com/q/40098771/358006) – Peter L Jun 18 '20 at 20:06
  • @JanHudec `\x00` is ASCII – CervEd Dec 18 '21 at 13:24
  • 1
    @CervEd, but it isn't text, because C uses it as terminator for text. – Jan Hudec Dec 18 '21 at 13:38
  • Unless you're forced to, on windows you should be using [Powershell 7 Pwsh.exe](https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-windows?view=powershell-7.3) --- You can run both Pwsh and WinPS side by side. That's because `>` in `WinPs` implicitly calls `out-file`, instead of `set-content`. If you're forced to use `WinPS`, make sure you're set up right: [about_CharacterEncoding PS7](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_character_encoding?view=powershell-7.3) -- how to fix `WinPS`. -- – ninMonkey Nov 12 '22 at 22:09
53

If you have not set the type of a file, Git tries to determine it automatically and a file with really long lines and maybe some wide characters (e.g. Unicode) is treated as binary. With the .gitattributes file you can define how Git interpretes the file. Setting the diff attribute manually lets Git interprete the file content as text and will do an usual diff.

Just add a .gitattributes to your repository root folder and set the diff attribute to the paths or files. Here's an example:

src/Acme/DemoBundle/Resources/public/js/i18n/* diff
doc/Help/NothingToSay.yml                      diff
*.css                                          diff

If you want to check if there are attributes set on a file, you can do that with the help of git check-attr

git check-attr --all -- src/my_file.txt

Another nice reference about Git attributes could be found here.

stollr
  • 6,534
  • 4
  • 43
  • 59
  • 3
    This was helpful, but is actually incorrect--the right attribute is `diff`, not `text`. The `text` attribute doesn't tell git to diff using text but instead controls how line endings are handled (normalization to LF). See your link to .gitattributes for more details. – ErikE Aug 11 '14 at 16:51
  • Thanks @ErikE. I have updated my post according to your comment and the Git documentation. – stollr Aug 12 '14 at 07:54
  • 5
    Additionally, you can set what sort of diff should be performed. For example, if it's an xml file you can use `diff=xml` instead of just `diff`. – Sandy Chapman Jan 28 '15 at 14:53
  • 1
    What is the opposite of check-attr - is there a set-attr? I originally accidentally saved a file as UTF-16, then commited and pushed it, and now BitBucket sees it as UTF-16, even after re-saving it as UTF-8, commiting and pushing it again. This basically makes my pull requests impossible to read because reviewers need to click into each individual comment to add review comments. – John Zabroski Jan 04 '16 at 16:15
34

I was having this issue where Git GUI and SourceTree was treating Java/JS files as binary and thus wouldn’t show a diff.

Creating a file named attributes in .git/info with following content solved the problem:

*.java diff
*.js diff
*.pl diff
*.txt diff
*.ts diff
*.html diff
*.sh diff
*.xml diff

If you would like this to apply to all repositories, then you can add the file attributes in $HOME/.config/git/attributes.

Hemant
  • 4,537
  • 8
  • 41
  • 43
  • 1
    Also note the `/.gitattributes` file, which makes the change active for all contributors, and only for the relevant project. – jpaugh Dec 12 '16 at 14:54
  • 2
    Adding `* diff` was helpful for me: it shows the difference in all types of files. But your solution is better, because of avoiding showing the unnecessary diff in large binary files. – Boolean_Type Jun 14 '19 at 06:33
  • Yeah! This helps! – WildCat May 08 '20 at 14:06
21

Git will even determine that it is binary if you have one super-long line in your text file. I broke up a long String, turning it into several source code lines, and suddenly the file went from being 'binary' to a text file that I could see (in SmartGit).

So don't keep typing too far to the right without hitting 'Enter' in your editor - otherwise later on Git will think you have created a binary file.

Chris Murphy
  • 6,411
  • 1
  • 24
  • 42
  • 1
    This is a correct information. I was trying to control diffs to an extremely large MySQL Dump (.sql file), but git treats it as a binary file, even if it has only ASCII/UTF8 data on it. The reason, is that lines are super-long (insert values (one),(two),(three),(...),(3 million...);. Strangely, for every commit, the git repository does not increase by 1.7gb, but only ~350mb. Perhaps, git is compressing the "binary" file before saving it. – Alexandre T. Jan 15 '16 at 17:55
  • @AlexandreT. Git does indeed compress file blobs (using GZip, IIRC). – jpaugh Dec 12 '16 at 14:56
16

I had this same problem after editing one of my files in a new editor. Turns out the new editor used a different encoding (Unicode) than my old editor (UTF-8). So I simply told my new editor to save my files with UTF-8 and then git showed my changes properly again and didn't see it as a binary file.

I think the problem was simply that git doesn't know how to compare files of different encoding types. So the encoding type that you use really doesn't matter, as long as it remains consistent.

I didn't test it, but I'm sure if I would have just committed my file with the new Unicode encoding, the next time I made changes to that file it would have shown the changes properly and not detected it as binary, since then it would have been comparing two Unicode encoded files, and not a UTF-8 file to a Unicode file.

You can use an app like Notepad++ to easily see and change the encoding type of a text file; Open the file in Notepad++ and use the Encoding menu in the toolbar.

deadlydog
  • 22,611
  • 14
  • 112
  • 118
  • 3
    Unicode is not an encoding. It's a charset and UTF-8 is one of its encoding, i.e. the way to encode a Unicode codepoint – phuclv Feb 23 '19 at 14:33
  • 1
    This does not resolve the issue, only avoids it. The issue is that git or its diff tool does not properly recognize text files or does not easily allow the user to override its behaviour. – Preza8 Jul 29 '19 at 09:05
12

This is also caused (on Windows at least) by text files that have UTF-8 with BOM encoding. Changing the encoding to regular UTF-8 immediately made Git see the file as type=text

Robba
  • 7,684
  • 12
  • 48
  • 76
  • I have two files that Notepad++ identifies as UTF-8 with BOM encoding. SourceTree/git is identifying one as binary and the other as text. I don't have anything definitive to say except that this doesn't appear to be a completely accurate statement. – goug Oct 29 '21 at 21:07
  • In my case my files were UTF-16 with BOM, changing the encoding with Notepad++ to regular UTF-8 fixed the problem, also I had to add manually in `.gitattributes` file the following: `*.extension diff` – Bud Damyanov Sep 08 '22 at 12:53
7

I have had same problem. I found the thread when I search solution on google, still I don't find any clue. But I think I found the reason after studying, the below example will explain clearly my clue.

    echo "new text" > new.txt
    git add new.txt
    git commit -m "dummy"

for now, the file new.txt is considered as a text file.

    echo -e "newer text\000" > new.txt
    git diff

you will get this result

diff --git a/new.txt b/new.txt
index fa49b07..410428c 100644
Binary files a/new.txt and b/new.txt differ

and try this

git diff -a

you will get below

    diff --git a/new.txt b/new.txt
    index fa49b07..9664e3f 100644
    --- a/new.txt
    +++ b/new.txt
    @@ -1 +1 @@
    -new file
    +newer text^@
howard
  • 83
  • 1
  • 5
7

We had this case where an .html file was seen as binary whenever we tried to make changes in it. Very uncool to not see diffs. To be honest, I didn't checked all the solutions here but what worked for us was the following:

  1. Removed the file (actually moved it to my Desktop) and commited the git deletion. Git says Deleted file with mode 100644 (Regular) Binary file differs
  2. Re-added the file (actually moved it from my Desktop back into the project). Git says New file with mode 100644 (Regular) 1 chunk, 135 insertions, 0 deletions The file is now added as a regular text file

From now on, any changes I made in the file is seen as a regular text diff. You could also squash these commits (1, 2, and 3 being the actual change you make) but I prefer to be able to see in the future what I did. Squashing 1 & 2 will show a binary change.

StuFF mc
  • 4,137
  • 2
  • 33
  • 32
  • Similar with one or two (successfully compiled) cpp files pushed up from VS. Renders the Github gui for _Compare_ ludicrous. One would not wish to be a fly on the bell in such a ding dong interchange,- VS on one side saying it's Github, and on the other side Github saying it's VS. :( – Laurie Stearn Jan 28 '20 at 12:37
6

Try using file to view the encoding details (reference):

cd directory/of/interest
file *

It produces useful output like this:

$ file *
CR6Series_stats resaved.dat: ASCII text, with very long lines, with CRLF line terminators
CR6Series_stats utf8.dat:    UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators
CR6Series_stats.dat:         ASCII text, with very long lines, with CRLF line terminators
readme.md:                   ASCII text, with CRLF line terminators
patricktokeeffe
  • 1,058
  • 1
  • 11
  • 21
  • 8
    `file` is not a git command. It's a totally separate tool packaged with git on Windows. Is there documentation showing that this is what git uses for binary file detection? – Max May 17 '18 at 15:22
  • 1
    Yes `file` is a Linux tool but it's packed with Git in C:\Program Files\git\usr\bin – patricktokeeffe Sep 17 '20 at 19:30
2

I had an instance where .gitignore contained a double \r (carriage return) sequence by purpose.

That file was identified as binary by git. Adding a .gitattributes file helped.

# .gitattributes file
.gitignore diff
Erik Živković
  • 4,867
  • 2
  • 35
  • 53
  • 1
    Worked. I also had a double \r to ignore some OS "Icon\r\r" file. Good to know the cause as well as the fix. – hsandt Jul 25 '18 at 12:33
1

If git check-attr --all -- src/my_file.txt indicates that your file is flagged as binary, and you haven't set it as binary in .gitattributes, check for it in /.git/info/attributes.

coberlin
  • 508
  • 5
  • 7
1

Change the Aux.js to another name, like Sig.js.

The source tree still shows it as a binary file, but you can stage(add) it and commit.

oscarz
  • 1,184
  • 11
  • 19
1

I had a similar issue as I pasted some text from a binary Kafka message, which inserted non-visible character and caused git to think the file is binary.

I found the offending characters by searching the file using regex [^ -~\n\r\t]+.

  • [ match characters in this set
  • ^ match characters not in this set
  • -~ matches all characters from ' ' (space) to '~'
  • \n newline
  • \r carriage return
  • \t tab
  • ] close set
  • + match one or more of these characters
Martyn Davis
  • 625
  • 1
  • 10
  • 16
1

I got the same message when the files I was diff-ing were generated in the Powershell terminal using the echo command:

echo "new file" > newfile.txt

The files remained binary even after I have opened and edited them with an editor.

The quick and dirty solution for me was to copy the content of those files, delete them, create them again directly from the editor (not from the terminal), and paste back the contents. Diff-ing afterwards showed the correct per-line conflicts as one would expect.

alds
  • 525
  • 9
  • 19
0

I just spent several hours going through everything on this list trying to work out why one of the test projects in my solution wasn't adding any tests to the explorer.

It turned out in my case that somehow (probably due to a poor git merge somewhere) that VS had lost a reference the project altogether. It was still building but I noticed that it only built the dependancies.

I then noticed that it wasn't showing up in the dependencies list itself, so I removed and re-added the test project and all my tests showed up finally.

cirrus
  • 5,624
  • 8
  • 44
  • 62
0

The reason my file was showing as binary (an dI was getting no diff using git diff or SourceTree) was because the file in question was added as a Git LFS file

Git (and SourceTree) do not seem to be able to diff text files added to LFS. However after a bit of hunting and I was able to fix this by running... git config --global diff.lfs.textconv cat

with help from the suggestion here... https://github.com/git-lfs/git-lfs/issues/440#issuecomment-501007460

Oliver Pearmain
  • 19,885
  • 13
  • 86
  • 90
0

I was having this issue on Windows when using echo in Powershell for creating text files. Using echo along with redirection operator > to write/append textual data would produce files with unicode encoding:

PS> echo 'sample' > data.txt
PS> Get-Encoding data.txt

Encoding                    Path
--------                    ----
System.Text.UnicodeEncoding data.txt
PS> git diff --staged

diff --git a/data.txt b/data.txt
...
Binary files /dev/null and b/data.txt differ

One solution is to convert encoding to ASCII: (You can also use Convert-FileEncoding to change the encoding of multiple files all at once)

PS> Set-Content data.txt $(Get-Content data.txt) -Encoding ascii

This way, git would no longer treat your text files as binaries:

PS> git add -A; git diff --staged


diff --git a/data.txt b/data.txt
...
--- /dev/null
+++ b/data.txt
@@ -0,0 +1 @@
+sample

To avoid this problem in the first place, one should use Powershell Set-Content and Add-Content commands for creating/appending text files:

PS> Set-Content data2.txt 'sample2'
PS> git add -A; git diff --staged data2.txt


diff --git a/data2.txt b/data2.txt
...
--- /dev/null
+++ b/data2.txt
@@ -0,0 +1 @@
+sample2
S4JJ4D
  • 1
  • 1
  • 2