First, a quick note:
To make git recognize the sql files as text I added
.sql diff
to the .gitattributes file ...
The .gitattributes
line should read *.sql diff
(I've fixed the linked answer, which is on a question about getting git diff
to treat the file as text). However, if the file really is text, you may want, or even need, *.sql text
. Note: this will not help at all if the file is not text. If the file's content is UTF-16, it is not text to Git, at least.
Consider marking the file as example_StoredProcedure.sql text
, i.e., not all .sql
files, just this one particular file. I'm also curious to see whether just marking it diff
suffices! Update, Nov 2019: apparently marking the file as diff
is not sufficient, though I have not verified this myself.
(The difference is that the diff
attribute tells Git how to show the file in git diff
output,1 while the text
tells Git that instead of using its built in guessing algorithm, it should, for all purposes, use the setting to decide whether the file is text. The guessing algorithm consists of scanning an initial chunk of the file's contents to see how many "text-like" characters there are vs "non-text-like" characters. Probably there should be a special allowance for UTF-8 Byte Order Markers at the top, but there isn't. Curiously, during filtering, there are explicit checks.)
1Well, it's actually more involved than just showing, but I think this is a good way to start thinking about the issues. Note that you can augment the diff
setting with a driver. It's not clear to me how the low level file merge interacts with a diff driver and I do not have time to experiment with it right now.
Longer explanation
warning: Cannot merge binary files: example_StoredProcedure.sql (... vs ...)
tells us that you are correct, that Git is treating the three versions of example_StoredProcedure.sql
as binary. (I see you added this output after the initial question; good thing, since it's the key!)
But why did I say three versions, when the line goes on to say:
HEAD vs. 4830c5886d3e1eac5ac76d1d49496afb43f444c3
Git is being a little lazy here: all merges involve three inputs, not just two. One of these is the one you supply explicitly—or, as in this case, git pull
ran git merge
and git pull
itself supplied the big ugly hash ID 4830c5886d3e1eac5ac76d1d49496afb43f444c3
.
The second input to a merge is always the current commit, aka HEAD
. You normally get this by being on the branch in the first place: HEAD
names the branch-name, the branch-name identifies the commit, and this is where you want the final merge commit to go, so it all fits together.
The third input—or internally, first; internally the "theirs" version is the third input—is one that Git computes for you, based on the HEAD
and other or --theirs
commits: Git walks through enough of the commit graph to find the best common ancestor commit.1 It's this common ancestor commit that determines which files need merging, and if a file does need merging, the built in merge driver needs to use diffs to get textual changes to merge. For both this and for git diff
, Git has a differencing engine built in to it (modified from LibXDiff).
Hence Git can, in effect, run:
git diff --find-renames <merge base commit> HEAD
to see what we did to each of our files, and:
git diff --find-renames <merge base commit> <other commit>
to see what they did to each of our files. Then:
If we changed a file and they did not touch it at all, the merge is easy: take ours.
If they changed a file and we did not touch it at all, the merge is easy: take theirs.
If we both changed a file but made the new file exactly the same, the merge is easy: take either one (ours, really, since it's in place).
Otherwise, attempt to combine the changes.
For speed reasons, Git uses the hash IDs ("blob" hashes, for the file's content) to accomplish the first three bullet points without ever having to fire up the file-level diff. This can, and does, merge unconflicted binary-file changes. It's only the final stage, where all three blob hashes differ, that requires a textual diff so as to combine changes.
Obviously, if Git can't diff the file, it cannot merge the two diff outputs. But does just marking the file as text-diff-able (pattern diff
in .gitattributes
) make the merge proceed? What happens if you set a diff driver, does the low-level file merge code use that driver? It "wants" to use the xdiff internal interface to find hunks; that's a lot easier than interpreting text output from a driver; you probably have to define a merge driver to get a detected-as-binary file to be merged, even if you have marked it as diff
.
Additional note, Nov 2019: Since Git 2.18, Git has the ability to convert between committed UTF-8 data and in-work-tree other-format data. To use this, set the working-tree-encoding
attribute. For instance, [the gitattributes documentation] shows an example line:
*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF
that would keep all *.ps1
files in UTF-8 internally (in the frozen, committed files inside each commit) but keep the useful-format versions of those files in your work-tree in UTF-16-LE. I have no data as to whether this would work with these SQL files.
1In all cases, but especially in problem cases where there's more than one best common ancestor, git merge
's behavior actually depends on the strategy you chose. The usual recursive
strategy will merge the merge bases, commit the result, and then use that commit as the merge base! Other merge strategies work differently.