Dealing with hunks that only change whitespace

Question

In a code I am maintaining, I sometimes receive pull requests where the committer just reflowed a paragraph for no apparent reason. Here is an example:

diff --git a/knuth.tex b/knuth.tex
index 2f6a2f8..7b0827d 100644
--- a/knuth.tex
+++ b/knuth.tex
@@ -1,6 +1,6 @@
 Thus, I came to the conclusion that the designer of a new
 system must not only be the implementer and first
-large||scale user; the designer should also write the first
+large-scale user; the designer should also write the first
 user manual.

 The separation of any of these four components would have
@@ -9,8 +9,7 @@ all these activities, literally hundreds of improvements
 would never have been made, because I would never have
 thought of them or perceived why they were important.

-But a system cannot be successful if it is too strongly
-influenced by a single person. Once the initial design is
-complete and fairly robust, the real test begins as people
-with many different viewpoints undertake their own
-experiments.
+But a system cannot be successful if it is too strongly influenced by
+a single person. Once the initial design is complete and fairly
+robust, the real test begins as people with many different viewpoints
+undertake their own experiments.

As you can see the first hunk introduced an actual change by substituting || with -, whereas the second hunk does not change anything but line breaking and whitespace. In fact, the word-diff of the second hunk would be empty.

Is it possible to either put a policy in place (e.g. on GitHub or in my CI) to reject commits containing such “empty” hunks, or reformat the patch to omit these hunks altogether so that I can git apply it without them?

Related: How to git-apply a git word diff

Your question seems to be about GitHub, not about Git. I'd suggest using that tag (and perhaps none of the current ones). — torek, Jan 12 '20 at 19:10
@torek The policy does not necessarily have to be on GitHub. I can also enforce it in my CI. Therefore I put GitHub in parentheses. — Henri Menke, Jan 12 '20 at 23:20
Have you tried changing core.eol or core.safecrlf or core.autocrlf to deal with the line breakers issue? And is this what you are looking for: https://stackoverflow.com/questions/3515597/add-only-non-whitespace-changes? — Yazeed Sabri, Jan 23 '20 at 21:37
@YazeedSabri Thank you for pointing this out, but no, the question you linked does not deal with reflowed text. — Henri Menke, Jan 24 '20 at 03:48
@HenriMenke You might want to try and add this option "--word-diff" to the diff command to deal with text reflow. — Yazeed Sabri, Jan 24 '20 at 06:00
@YazeedSabri The problem with `--word-diff` is that is generates a patch that is not compatible with `git apply`. — Henri Menke, Jan 24 '20 at 08:25
@HenriMenke what is not compatible about it? I think using the --no-prefix option for diff might help. — Yazeed Sabri, Jan 24 '20 at 16:01

score 3 · Accepted Answer · answered Jan 21 '20 at 02:22

If you're looking for a built-in solution, I'm not aware of one. However, that doesn't mean this can't be relatively easily built into a CI system.

You could pipe the output of an appropriate git diff command into a script like the following, which will exit 1 if fed a patch containing a hunk like the second above.

#!/usr/bin/env ruby

def filter(arr)
  arr.join.split("\n\n").map { |x| x.gsub(/\s+/, ' ') }.join("\n\n")
end

def should_reject(before, after)
  return false if before.empty? && after.empty?
  before = filter(before)
  after = filter(after)
  return true if before == after
  false
end

chunk = nil
before = []
after = []
while (line = gets)
  trimmed = line[1..-1]
  case line
  when /^(\+\+\+|---)/
    # Do nothing.
  when /^@@ /
    if should_reject(before, after)
      warn "Useless change to hunk #{chunk}"
      exit 1
    end
    chunk = line
    before = []
    after = []
  when /^ /
    before << trimmed
    after << trimmed
  when /^\+/
    after << trimmed
  when /^-/
    before << trimmed
  end
end

if should_reject(before, after)
  warn "Useless change to hunk #{chunk}"
  exit 1
end

It essentially splits each hunk into chunks with a blank line between them, turns all whitespace into spaces, and then compares them. If they're equal, it complains and exits nonzero. You may wish to modify it to be more robust, like dealing with CRLF endings or such, but the approach is viable.

As a side note, one approach that makes these kinds of changes easier to spot is to use the sentence-per-line style. Each sentence, regardless of length, is on one whole line, and there is only one sentence per line. This makes diffing any sort of change much easier and avoids wrapping issues altogether.

Dealing with hunks that only change whitespace

1 Answers1