Regarding how they work, I was wondering low-level working stuff:
- What will trigger a merge conflict?
- Is the context also used by the tools in order to apply the patch?
- How do they deal with changes that do not actually modify source code behavior? For example, swapping function definition places.
Regarding safety, truth be told, the huge Linux kernel repository is a testament for their safety. But I wondering about the following points:
- Are there any caveats/limitations regarding the tools that the user should be aware of?
- Have the algorithms been proven to not generate wrong results?
- If not, are there implementations/papers proposing integration testing that at least prove them to be error-free empirically? Something like the content of these papers BrianKorver and JamesCoplien.
- Again, the Linux repository should suffice regarding the previous point, but I was wondering about something more generic. Source code, even when changed, will not change much (specially because of the algorithm implemented and syntax restrictions), but can the safety be generalized to generic text files?
Edit
Ok people, I'm editing since the question is vague and answers are not addressing details.
Git/diff/patch details
The unified diff format, which Git seems to use by default, basically outputs three things: the change, the context surrounding the change, and line numbers pertinent to the context. Each one of these things may or may not have been changed concurrently, so Git basically has to deal with 8 possible cases.
For example, if lines have been added or removed before the context, line numbers will be different; but if the context and the changes are still the same, then diff could use the context itself to align the texts and apply the patch (I do not know if this indeed happens). Now, what would happen on the other cases? I would like to know details of how Git decides to apply changes automatically and when it decides to issue an error and let the user resolve the conflict.
Reliability
I'm pretty much sure the Git is fully reliable since it do have the full history of commits and can traverse history. What I would like is some pointers to academic research and references regarding this, if they exist.
Still kinda related to this subject, we know that Git/diff treat files as generic text files and work on lines. Furthermore, the LCS algorithm employed by diff will generate a patch trying to minimize the number of changes.
So here are some things I would like to know also:
- Why is LCS used instead of other string metric algorithms?
- If LCS is used, why not use modified versions of the metric that do take into account the grammatical aspects of the underlying language?
- If such a metric that takes into account grammatical aspects are used, could they provide benefits? Benefits in this case could be anything, for example, a cleaner "blame log".
Again, these could be huge topics and academic articles are welcome.