4

Is it possible to change the line break used by git to something other than the default \n (e.g. a period . or period plus whitespace)?

I am asking because this would make it easier to use git to manage text files such as documentation and markdown files. I have seen articles suggesting people to put each sentence in its own line just so that it is treated as one unit by git (rather than a part of a longer paragraph), which is awkward. Hence the question here.

I did some internet search to no avail.

Inigo
  • 12,186
  • 5
  • 41
  • 70
thor
  • 21,418
  • 31
  • 87
  • 173
  • It might be configurable in some editors. Use the editor as Git's difftool, mergetool and editor. – ElpieKay May 25 '20 at 01:29
  • 1
    You could create a difftool that could do this. I have a C# nuget package, [difflib](https://www.nuget.org/packages/difflib/) that mimics the way git does diffs, you would have to feed it collections of sentences obtained in any way that makes sense to you. Shouldn't be *that* hard to do but if you want to handle some things in markdown that would look like sentences you might need to implement a lot of the markdown syntax handling. If you decide to go down that route and have questions about difflib, feel free to ping me here or by lasse@vkarlsen.no – Lasse V. Karlsen May 25 '20 at 09:31

1 Answers1

4

interesting idea! But sorry, no.

I upvoted your question because I love the idea. Unfortunately the answer is: No, Git does not support this.

As stated in the git config documentation, the valid values for core.eol are lf and crlf:

Sets the line ending type to use in the working directory for files that are marked as text (either by having the text attribute set, or by having text=auto and Git auto-detecting the contents as text). Alternatives are lf, crlf and native, which uses the platform’s native line ending. The default value is native. See gitattributes[5] for more information on end-of-line conversion. Note that this value is ignored if core.autocrlf is set to true or input.

Other related git config settings are core.safecrlf and core.autocrlf. gitattributes documentation also says the same.

why git is unlikely to ever support this

lf and cf are control characters with very specific meaning. Regular characters such as period . have many meanings depending on the context. In many langauges it marks the end of a sentence. But it means something different in numbers. ... is often used to be an ellipses, which is not three sentence endings.

So git supporting such an option would result in a mess for many text files stored in a git repo.

a workaround: use a git commit hook to automatically insert lf after every period in your text file that doesn't have one.

It would be a pretty simple regular expression to do that.

By trying this approach you will discover one of two things:

  • (a) Cool, it works for me! And my files are still normal text files and my repo is still normal so other people can use it.

  • (b) Wow, now I know why they don't support this. What a #$*&#@CRLF mess!

why you really don't need this

The reason there are "articles suggesting people to put each sentence in its own line" is because git diff used to support only line granularity diffs. Line diffs work great for code but suck for prose. Inserting a sentence or even editing one word results in the whole paragraph being marked as changed unless the paragraph is broken up into lines.

But git diff now supports word granularity if you use the --word-diff[=<mode>], --word-diff-regex=<regex> or --color-words[=<regex>] option.

Type git help diff or see git-diff Documentation for more info.

Inigo
  • 12,186
  • 5
  • 41
  • 70
  • Thanks. Would you mind showing how to implement the hook? I imagine I need two hooks, one for adding the `lf` before commit and the other for removing the added `lf` before checking out. I asked a follow-up question here. https://unix.stackexchange.com/questions/588989/how-to-replace-with-witespace – thor May 26 '20 at 07:10
  • By the way, I'll assume that a period ending a sentence is always followed by a whitespace. And in that case, I would insert a `lf` in between. – thor May 26 '20 at 07:12