I'm trying to obtain changes between commits for a large number of HTML documents, but I quickly noticed that most changes are not important and are usually the result of logging, changes in versions to prevent caching or external scripts. For example:
<a class="support-ga" target="_blank" href="#">0fb63cacd50e / 0fb63cacd50e @
-app-151</a>
+app-107</a>
<input type='hidden' name='csrfmiddlewaretoken'
-value='82NB5DdySoICu1mqcl0RZVk5dMCOVEQd'
+value='a0zBgxBevaBugotGpNKI6kMPsIsBbH44'
/>
The previous example shows that looking at those changes is probably not very interesting or useful.
I would like to know if there is a git diff command to ignore that kind of changes. Another alternative is to have a ranking of the differences based on similarity. So far I have been using the git diff --word-diff=porcelain --unified=0 HEAD~1 HEAD
command and then processing that output to extract changes, calculate the Levenshtein distance and remove duplicates. That helps but it is not a great solution considering that git already knows which lines are supposed to be compared and provides a configurable number of lines as context.