2

Consider a simple file like this:

Commited:

foo;

Edited:

bar;

Now, when I run git diff --word-diff-regex="[a-z]+", it gives me

[-foo-]{+bar+};

So, it shows that the word foo has been replaced by bar. Note that the semicolon is not marked as changed. That's fine.


Now, if I add another line to the file like this:

bar;

qux;

the above does not work anymore:

$ git diff --word-diff-regex="[a-z]+"

Expected output:

[-foo-]{+bar+};

{+qux;+}

Actual output:

[-foo-]{+bar;+}

{+qux+};

Note that the semicolon in the first line is now considered part of a word. The docs state

Anything between these matches is considered whitespace and ignored(!) for the purposes of finding differences.

Then why does it not ignore the semicolons? Is that a bug? Can I use a better regex to make it work?

I actually use a more sophisticated regex in reality, but the problem is the same.


Edit

Strangely, it works when the committed file contains an additional newline like so:

foo;
 
ThomasR
  • 1,067
  • 1
  • 11
  • 16

1 Answers1

0

might be relevant to those who end up in this post like me.

Since I use markdown for my documents not related to programming, and also using git to track the changes, others like me might end up here where changes for a date like "2022-09-12" into "2022-09-13" is relevant to be noticed.

And so, I found this solution (finally after spending inordinate hours tinkering) from the documentation of git itself

git diff [commit hash] --word-diff-regex=. -U1000 -- [file name]

-U1000 is a hardcoded solution for now, it's a bit irksome, but it'll do for now. In my context, it shows the whole file (having only around 500 lines in the file)

and the --word-diff-regex=. is something I got from the documentation

I used to use --word-diff-regex='(\w+|\s|\.|,|\d+)' to capture periods '.', commas ',', spaces, and words but changed it into the one found in the docs because the following change wasn't highlighted

from

... # some other text
<h6>version 1</h6>
<h6>2022-09-20</h6>
... # some other text

into

... # some other text
<h6>version 1.1</h6>
<h6>2022-09-23</h6>
... # some other text

running git in powershell:

PS C:...> git diff fbf86a6cfa240cb0a0e98fe8ebf80171d2b801fb --word-diff-regex='(\w+|\s|\.|,|\d+)' -- file.md
diff --git a/file.md b/file.md
index 6c0ce48..0d3b482 100644
--- a/file.md
+++ b/file.md
@@ -1,6 +1,6 @@
...
<h6>version 1{+.1+}</h6>
<h6>2022-09-23</h6>

But with only . as a regex pattern

PS C:...> git diff fbf86a6cfa240cb0a0e98fe8ebf80171d2b801fb --word-diff-regex=. -- file.md
diff --git a/file.md b/file.md
index 6c0ce48..0d3b482 100644
--- a/file.md
+++ b/file.md
@@ -1,6 +1,6 @@
...
<h6>version 1{+.1+}</h6>
<h6>2022-09-2[-0-]{+3+}</h6>

The change in date is now highlighted.

Hope this helps someone.

Edited: I added the pattern \d+, because it is actually included in my usual regex pattern for seeking changes, but even with it added, the change from "20" to "23" still wasn't detected.

MikeTheSapien
  • 161
  • 3
  • 8