6

I'm trying to use GIT for tracking changes to a pretty big XML file (about 3 Mb). The file is in UTF-8, and CRLF line ending (I'm working in Windows 10). But for some reason GIT keeps thinking it's binary file and does not show any diff. Or just can't detect changes.

Diff in Sourcetree shows the message "No changes in this file detected, or it is a binary file"

I tried to explicitly set attributes in .gitattributes, but it seems that it's not the reason:

 *.xml crlf diff
   git check-attr --all -- sorkin.xml
   sorkin.xml: diff: set
   sorkin.xml: crlf: set 

I found when I cut the big file into three smaller pieces (less than 1 Mb), GIT shows the changes for them correctly.

Is there some GIT's limitation on file size when calculating diff?

Alexander Sorkin
  • 634
  • 7
  • 20
  • 2
    Are you 100% sure it's UTF-8? If Git think it's a binary file (and it says “Binary files differ” or something like that), usually that means that it's in UTF-16 on Windows. – bk2204 Aug 29 '20 at 16:14
  • Could there be a zero byte in the remaining 2MB ? See [this question](https://stackoverflow.com/q/6119956/86072) for explanations on how git detects binary files – LeGEC Aug 29 '20 at 19:48
  • I'm pretty sure it's UTF-8, tested it with Notepad++ and with https://onlineutf8tools.com/validate-utf8. – Alexander Sorkin Aug 30 '20 at 09:45
  • Checked for zero bytes - searched through the file with hex editor. No, there are no zero bytes in the file. – Alexander Sorkin Aug 30 '20 at 09:49
  • On Windows I'm using GitExtension (sadly available only on Windows). Windows version of SourceTree is slow and annoying. On MacOS works much better. So instead fighting with a tool try something else. – Marek R Aug 30 '20 at 11:02

1 Answers1

15

The problem was not GIT options, but SourceTree settings Tools/Options/Diff. By default it has setting for internal Diff View - Size Limit for text files of 1024Kb.

I set it to a higher value and it resolved my problem.

I mistakenly interpreted SourceTree error message "No changes in this file detected, or it is a binary file". But when I created a text files with length of 1048576 and 1048577 bytes and committed them, I found out, that git diff command line works, while Atlassian SourceTree ignores changes the the bigger file.

It was important to use proper syntax for git diff: git diff commit_hash 1048577.txt instead of git diff 1048577.txt, since the last command shows diff between HEAD and current changes, and it was my second factor of misinterpreting the problem.

Alexander Sorkin
  • 634
  • 7
  • 20
  • 1
    Unfortunately the SourceTree error message "No changes in this file detected, or it is a binary file" can be received for more than one reason. You can expect this message if the file's size is larger than the size set in SourceTree's option panel. You will also get this message if your text file is not encoded as UTF-8. You can overcome the size problem as you mentioned thru: Tools/Options/Diff You can overcome the UTF-8 UTF-16 problem here: https://stackoverflow.com/questions/777949/can-i-make-git-recognize-a-utf-16-file-as-text?rq=1 – Rod Dewell Apr 01 '21 at 21:47