Why does git interpret sql files as binaries during a merge conflict?

Question

I got the problem with resolving merge conflicts within sql files.

MenkeTTA@909086 MINGW64 //FILE0019 (master)
$ git pull
remote: Microsoft (R) Visual Studio (R) Team Services
remote: Found 5 objects to send. (5 ms)
Unpacking objects: 100% (5/5), done.
From https://***
d58a69b..4830c58  master     -> origin/master
warning: Cannot merge binary files: example_StoredProcedure.sql (HEAD vs.    4830c5886d3e1eac5ac76d1d49496afb43f444c3)
Auto-merging WRR - example_StoredProcedure.sql
CONFLICT (content): Merge conflict in example_StoredProcedure.sql 
Automatic merge failed; fix conflicts and then commit the result.

When the merge conflict is created git isn't creating a pre-merged file with the competing changes as in the usual structure:

/SQL-File/

<<<<<<< HEAD
competing change A
=======
competing change B
>>>>>>> branch-a

Git is treating both files as binaries – but only for the merge-conflict operations (normal merge without conflict works properly). I can choose my own version of the file or the pulled competing file from the remote as the new head for the next push.

I reproduced this conflict with a normal .txt file. Git is treating the merge conflict then as expected with creating one pre-merged file with both competing changes/commits where I can manually fix the code how I want to.

To make git recognize the sql files as text I added

.sql diff

to the .gitattributes file like it's described here. Does anyone know how I can make git to create a ordinary pre-merged file with both versions of the competing commits when working with sql files?

I don't think your hypothesis that Git is treating plain text SQL files as binary is correct. Rather, my guess is that Git is just auto merging, perhaps in a way you disagree with. While there is a way to force Git to conflict every merge, I would first look to your Git workflow, and try to separate concerns better. — Tim Biegeleisen, Aug 25 '18 at 10:39
thanks for the reply. I added the original output from the git client to the post to give more insight regarding my git workflow. Git is aborting auto-merging and interprets the sql file as binary. When I checked the the file afterwards I just found my previous changes. — tmenke, Aug 25 '18 at 11:10

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

First, a quick note:

To make git recognize the sql files as text I added
.sql diff
to the .gitattributes file ...

The .gitattributes line should read *.sql diff (I've fixed the linked answer, which is on a question about getting git diff to treat the file as text). However, if the file really is text, you may want, or even need, *.sql text. Note: this will not help at all if the file is not text. If the file's content is UTF-16, it is not text to Git, at least.

Consider marking the file as example_StoredProcedure.sql text, i.e., not all .sql files, just this one particular file. ~~I'm also curious to see whether just marking it diff suffices!~~ Update, Nov 2019: apparently marking the file as diff is not sufficient, though I have not verified this myself.

(The difference is that the diff attribute tells Git how to show the file in git diff output,¹ while the text tells Git that instead of using its built in guessing algorithm, it should, for all purposes, use the setting to decide whether the file is text. The guessing algorithm consists of scanning an initial chunk of the file's contents to see how many "text-like" characters there are vs "non-text-like" characters. Probably there should be a special allowance for UTF-8 Byte Order Markers at the top, but there isn't. Curiously, during filtering, there are explicit checks.)

¹Well, it's actually more involved than just showing, but I think this is a good way to start thinking about the issues. Note that you can augment the diff setting with a driver. It's not clear to me how the low level file merge interacts with a diff driver and I do not have time to experiment with it right now.

Longer explanation

warning: Cannot merge binary files: example_StoredProcedure.sql (... vs ...)

tells us that you are correct, that Git is treating the three versions of example_StoredProcedure.sql as binary. (I see you added this output after the initial question; good thing, since it's the key!)

But why did I say three versions, when the line goes on to say:

HEAD vs.    4830c5886d3e1eac5ac76d1d49496afb43f444c3

Git is being a little lazy here: all merges involve three inputs, not just two. One of these is the one you supply explicitly—or, as in this case, git pull ran git merge and git pull itself supplied the big ugly hash ID 4830c5886d3e1eac5ac76d1d49496afb43f444c3.

The second input to a merge is always the current commit, aka HEAD. You normally get this by being on the branch in the first place: HEAD names the branch-name, the branch-name identifies the commit, and this is where you want the final merge commit to go, so it all fits together.

The third input—or internally, first; internally the "theirs" version is the third input—is one that Git computes for you, based on the HEAD and other or --theirs commits: Git walks through enough of the commit graph to find the best common ancestor commit.¹ It's this common ancestor commit that determines which files need merging, and if a file does need merging, the built in merge driver needs to use diffs to get textual changes to merge. For both this and for git diff, Git has a differencing engine built in to it (modified from LibXDiff).

Hence Git can, in effect, run:

git diff --find-renames <merge base commit> HEAD

to see what we did to each of our files, and:

git diff --find-renames <merge base commit> <other commit>

to see what they did to each of our files. Then:

If we changed a file and they did not touch it at all, the merge is easy: take ours.
If they changed a file and we did not touch it at all, the merge is easy: take theirs.
If we both changed a file but made the new file exactly the same, the merge is easy: take either one (ours, really, since it's in place).
Otherwise, attempt to combine the changes.

For speed reasons, Git uses the hash IDs ("blob" hashes, for the file's content) to accomplish the first three bullet points without ever having to fire up the file-level diff. This can, and does, merge unconflicted binary-file changes. It's only the final stage, where all three blob hashes differ, that requires a textual diff so as to combine changes.

Obviously, if Git can't diff the file, it cannot merge the two diff outputs. But does just marking the file as text-diff-able (pattern diff in .gitattributes) make the merge proceed? What happens if you set a diff driver, does the low-level file merge code use that driver? It "wants" to use the xdiff internal interface to find hunks; that's a lot easier than interpreting text output from a driver; you probably have to define a merge driver to get a detected-as-binary file to be merged, even if you have marked it as diff.

Additional note, Nov 2019: Since Git 2.18, Git has the ability to convert between committed UTF-8 data and in-work-tree other-format data. To use this, set the working-tree-encoding attribute. For instance, [the gitattributes documentation] shows an example line:

*.ps1    text working-tree-encoding=UTF-16LE eol=CRLF

that would keep all *.ps1 files in UTF-8 internally (in the frozen, committed files inside each commit) but keep the useful-format versions of those files in your work-tree in UTF-16-LE. I have no data as to whether this would work with these SQL files.

¹In all cases, but especially in problem cases where there's more than one best common ancestor, git merge's behavior actually depends on the strategy you chose. The usual recursive strategy will merge the merge bases, commit the result, and then use that commit as the merge base! Other merge strategies work differently.

Thanks a lot for that really instructive reply. Regarding the entry in the .gitattributes file I did a typo in the original post. I inserted `*.sql diff` how it should be. I also already tried the version with `*.sql text` unfortunately it led to the same conflict. Could you please describe what you mean with "set a diff driver "? Do you know how I could define a **merge driver** for this specific case? — tmenke, Aug 25 '18 at 18:21
You mean that with `*.sql text`, you still got the "warning: Cannot merge binary files" complaint? As for defining a merge driver, that's straightforward enough mechanically: see [the gitattributes documentation](https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html#_defining_a_custom_merge_driver). However, writing the actual *driver* is hard. — torek, Aug 26 '18 at 02:15
Yes, exactly. Unfortunately `*.sql text ` didn't help. I also tried now the `example_StoredProcedure.sql text` option but it didn't help. I can't believe that it's so hard to fix the issue. There have to be quite a lot of people who are using git versioning for sql files. — tmenke, Aug 26 '18 at 19:52
This is well outside of the range of things I normally do, but I imagine the way to merge sql files is to dump them to text, merge the text, and then build a new sql file from the result. (That's something you can do in a merge driver.) Google turned this up, though it's specifically aimed at sqlite: https://github.com/cannadayr/git-sqlite — torek, Aug 26 '18 at 20:15
@Riz: it's not clear what you mean by "this". Is the file *actually* text, or is it actually binary? Did you define and write your own merge driver, and that still doesn't work? (Note that Git won't invoke a merge driver unless all *three* inputs are different.) — torek, Nov 14 '19 at 13:33
SQL files generated by a Microsoft tool (and there's a hint about this in the question) may well be stored in UTF-16 encoding. That may well explain the "binary mystery". — kostix, Nov 14 '19 at 15:35
@torek by "this" i mean your solution doesn't work for me. Since the question is about .sql files being misinterpreted as binary, yes it is text. — Riz, Nov 14 '19 at 16:38
@Riz: Ah. Note that my answer was mainly about the fact that `.sql diff` is wrong, it's `*.sql` or `somefile.sql` and `text`: `diff` just marks the file as text-ish for diff purposes, not more generally (e.g., for CRLF editing). But if the file is genuinely binary—as UTF-16 in fact is (and see kostix comment)—`git merge` will not be able to merge it without assistance, e.g., from a merge driver. — torek, Nov 14 '19 at 17:40

Why does git interpret sql files as binaries during a merge conflict?

1 Answers1

Longer explanation