12

I am using git with --color-words to view my diff. In my diff, it shows that I removed

<b>{{ljcount}}</b>&nbsp;&nbsp;&nbsp;Changes

And that I added:

<b>{{skills_limits}}</b>&nbsp;&nbsp;&nbsp;Changes

This is larger than what I would like it to be (I want the word boundary to be at the {}). I tried playing around with --word-diff-regex, but I couldn't find a regex to make it work. How can I achieve this result?

Casebash
  • 114,675
  • 90
  • 247
  • 350

3 Answers3

12

From git help diff:

   --word-diff-regex=<regex>
       Use <regex> to decide what a word is, instead of considering runs of non-whitespace to be a word. Also implies
       --word-diff unless it was already enabled.

The following expression will make a word be any string of characters and underscore, or any non-whitespace character.

$ git diff --color-words --word-diff-regex='\\w+|[^[:space:]]'
Casebash
  • 114,675
  • 90
  • 247
  • 350
holygeek
  • 15,653
  • 1
  • 40
  • 50
  • 1
    You might want to use `[^{} ]`, otherwise spaces are now considered "words" (or even `[^{}[:space:]]`, though I'm not certain what regex engine is used here) – Lily Ballard Dec 13 '11 at 01:25
  • This actually makes it worse, it seems to be treating each individual letter as a word! – Casebash Dec 13 '11 at 03:29
  • In that case you want to set your word regex to something like this: ``--word-diff-regex='[A-z_][A-z_]*'`` – holygeek Dec 13 '11 at 04:36
  • @holygeek: --word-diff-regex='[A-z_]+' is equivalent. Okay, my mistake before was using a * rather than a +. This however, cause other characters, such as commas to be hidden from the dif. Git suggests adding |[^[:space:]] to ensure that any single non-whitespace character can be counted as a word. – Casebash Dec 13 '11 at 05:20
  • @Casebash thanks for the note on adding single non-whitespace character. I ran across this issue before when doing diff between mysql dump data. Now with the correct regex I can benefit from ``--color-words`` sql diff too. – holygeek Dec 13 '11 at 05:32
  • Also \w is shorthand for any alphanumeric character or the underscore (but we have to escape the \) – Casebash Dec 13 '11 at 05:37
  • 1
    I'm getting the same output if I drop the `\\w+`, e.g. `--word-diff-regex='\\w+|[^[:space:]]'` vs. `--word-diff-regex='[^[:space:]]'` Also, the editing of this answer is inappropriate, as it makes it impossible to follow the comments, or figure out which revision the OP accepted. – EoghanM May 10 '14 at 09:33
9

Since you already use --color-words, you don't need to supply --word-diff-regex separately, the first option accepts a regex:

--color-words[=<regex>]

Equivalent to --word-diff=color plus --word-diff-regex=<regex> (if a regex was specified).

A regex that works particularly well for me is:

$ git diff --color-words='\w+|.'
arekolek
  • 9,128
  • 3
  • 58
  • 79
1

If you are using --color-words[=<regex>], make sure to use Git 2.32 (Q2 2021) or more recent: the word-diff mode has been taught to work better with a word regexp that can match an empty string.

See commit 0324e8f (04 May 2021) by Phillip Wood (phillipwood).
(Merged by Junio C Hamano -- gitster -- in commit 65c1891, 14 May 2021)

word diff: handle zero length matches

Signed-off-by: Phillip Wood

If find_word_boundaries() encounters a zero length match (which can be caused by matching a newline or using '*' instead of '+' in the regex) we stop splitting the input into words which generates an inaccurate diff.
To fix this increment the start point when there is a zero length match and try a new match.
This is safe as posix regular expressions always return the longest available match so a zero length match means there are no longer matches available from the current position.

Commit bf82940 ("color-words: enable REG_NEWLINE to help user", 2009-01-17, Git v1.6.2-rc0 -- merge) prevented matching newlines in negated character classes but it is still possible for the user to have an explicit newline match in the regex which could cause a zero length match.

One could argue that having explicit newline matches or using '*' rather than '+' are user errors but it seems to be better to work round them than produce inaccurate diffs.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250