0

I obtained this code sample from someone else here:

  git diff --color=always | \
    gawk '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
      match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
      bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
      {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
      bare~/^-/{print "-"left++ ":" line;next};\
      bare~/^[+]/{print "+"right++ ":" line;next};\
      {print "("left++","right++"):"line;next}'

and would like to have it output properly-aligned lines. Unfortunately, it might output line numbers in your git diff like this:

+240:+ some code here
(241,257): some code here

rather than this to force alignment:

+240     :+some code here
(241,257): some code here

This is one thing I've tried, thinking printf might do the trick (ex: printf "-%-8s:"):

  git diff HEAD~..HEAD --color=always | \
    gawk '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
      match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
      bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
      {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
      bare~/^-/{printf "-%-8s:" left++ line;next};\
      bare~/^[+]/{printf "+%-8s:" right++ line;next};\
      {print "("left++","right++"): "line;next}'

but it produces this error:

gawk: cmd. line:5: (FILENAME=- FNR=9) fatal: not enough arguments to satisfy format string
    `-%-8s:151-    STR_GIT_LOG="" #######'
        ^ ran out for this one

This bash script is just way over my head at the moment and I've been tinkering on it for quite some time. Perhaps someone can help me out?

Additionally, the numbers and +/- signs should be green and red, respectively, like in normal git diff output.


EDIT by Ed Morton - making the OPs code readable by pretty-printing it using gawk -o- with gawk 5.0.1:

$ gawk -o- '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
  match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
  bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
  {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
  bare~/^-/{print "-"left++ ":" line;next};\
  bare~/^[+]/{print "+"right++ ":" line;next};\
  {print "("left++","right++"):"line;next}'

.

{
    bare = $0
    gsub("\033[[][0-9]*m", "", bare)
}

match(bare, "^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@", a) {
    left = a[1]
    right = a[2]
    next
}

bare ~ /^(---|\+\+\+|[^-+ ])/ {
    print
    next
}

{
    line = gensub("^(\033[[][0-9]*m)?(.)", "\\2\\1", 1, $0)
}

bare ~ /^-/ {
    print "-" left++ ":" line
    next
}

bare ~ /^[+]/ {
    print "+" right++ ":" line
    next
}

{
    print "(" left++ "," right++ "):" line
    next
}
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • 1
    Some white space and indenting would go a **LONG** way to making your code more readable. – Ed Morton May 21 '20 at 14:15
  • 2
    Now that it's pretty-printed I can see you're using string delimiters (`"..."`) instead of regexp delimiters (`/.../`) around your regexps - don't do that because that forces awk to interpret the string first before using it as a regexp and so you need to escape everything twice to account for the double parsing. So, for example, `"^(\033[[][0-9])"` needs to be `/^(\033[[][0-9])/` or `"^(\\033[[][0-9])"` with the latter being undesirable. – Ed Morton May 21 '20 at 14:32
  • [edit] your question to provide concise, testable sample input and expected output (not links, not images, just text) so we can help you do whatever it is you're trying to do. The only part of your question that's shell, btw, is the pipe symbol `|` between the 2 commands and that's not bash-specific. When you say `This bash script is just way over my head` - what you're asking for help with is an awk script, not a bash script. awk and bash are 2 different, independent tools, each with their own scope, syntax, and semantics. Awk is far more similar to C than it is to bash or any other shell. – Ed Morton May 21 '20 at 14:46
  • @EdMorton, thanks for the help. You and @Inian have been instrumental in my work on this tonight. As for white space and indenting, I couldn't agree more, but I didn't know how to and it was 3am my time. I had to call it a night. I've never used `awk` or `gawk` before and don't know a thing about them. I just checked, and `gawk --version` shows I have 4.1.4, and when I try to do the `gawk -o-` formatting trick you did, I don't get anything useful. It just sits there, so if I press enter repeatedly I see `(0,0):`, `(1,1):`, `(2,2):`, etc. Am I doing something wrong? – Gabriel Staples May 22 '20 at 04:50
  • that's my experience with that gawk version too and since I have 5.0.1 which does what I want I'm too lazy to investigate. – Ed Morton May 22 '20 at 12:44
  • 1
    @EdMorton, I see what you mean about `"regex pattern"` vs `/regex pattern/` now. See here: https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html. **"To get a backslash into a regular expression inside a string, you have to type two backslashes."** And, **"Given that you can use both regexp and string constants to describe regular expressions, which should you use? The answer is 'regexp constants,' for several reasons"**. So, that comment now gets an upvote. Thanks. – Gabriel Staples May 23 '20 at 06:09
  • More help requested if you're in the mood. I see you have a lot of `awk` topic points...https://stackoverflow.com/questions/61979177/can-the-regex-matching-pattern-for-awk-be-placed-above-the-opening-brace-of-the – Gabriel Staples May 23 '20 at 22:10
  • I've now put 10+ hrs into learning awk, and created these examples: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/tree/master/awk. These examples, comments, and links would prove useful to anyone getting started learning `awk`/`gawk`. – Gabriel Staples May 24 '20 at 01:29

2 Answers2

1

It should be a minor typo (most likely) because printf() in awk expects a , after the format specifiers

printf "-%-8s:", left++ line
#             ^^^
Inian
  • 80,270
  • 14
  • 142
  • 161
1

1/3: the bug fix

@Inian was correct: I just needed commas between arguments. I've put in the work (perhaps as much as ~20~30 hrs since posting this question) and I'm pretty decent at the basics of using awk now. I've learned a ton.

For the sake of answering this question, here's the solution I came up with right after @Inian posted his answer, based on his feedback. The key parts to focus in on are the printf calls. Notice I've added commas in between the format string and each argument thereafter. As he said, that's the fix.

Parts to focus on:

printf "-%+4s     :%s\n", left++, line
printf "+%+4s     :%s\n", right++, line
printf " %+4s,%+4s:%s\n", left++, right++, line

Whole thing in context:

git diff HEAD~..HEAD --color=always | \
  gawk '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
    match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
    bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
    {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
    bare~/^-/{printf   "-%+4s     :%s\n", left++, line;next};\
    bare~/^[+]/{printf "+%+4s     :%s\n", right++, line;next};\
    {printf            " %+4s,%+4s:%s\n", left++, right++, line;next}'

Here's some sample output I get just by copying and pasting the above script into my terminal. If you'd like to duplicate this exactly, go git clone my dotfiles repo and run git checkout 4386b089f163d9d5ff26d277b53830e54095021c. Then, copy and paste the above script into your terminal. The output looks pretty good. The alignment of the numbers and things on the left now looks nice:

$     git diff HEAD~..HEAD --color=always | \
>       gawk '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
>         match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
>         bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
>         {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
>         bare~/^-/{printf   "-%+4s     :%s\n", left++, line;next};\
>         bare~/^[+]/{printf "+%+4s     :%s\n", right++, line;next};\
>         {printf            " %+4s,%+4s:%s\n", left++, right++, line;next}'
diff --git a/useful_scripts/git-diffn.sh b/useful_scripts/git-diffn.sh
index 22c74e2..cf8ba08 100755
--- a/useful_scripts/git-diffn.sh
+++ b/useful_scripts/git-diffn.sh
   49,  49: #   4. `git-gs_diffn`
   50,  50: #   3. `gs_git-diffn`
   51,  51: 
+  52     :+# FUTURE WORK:
+  53     :+# 1. Make work with standard awk?
+  54     :+#    This has been tested on Linux Ubuntu 18.04. If anyone can't get this working on their system,
+  55     :+#    such as in the git bash terminal that comes with Git for Windows, or on MacOS, due to 
+  56     :+#    compatibility probems with `gawk`, I can rewrite the few places relying on `gawk` extensions
+  57     :+#    to just use basic awk instead. That should solve any compatibility problems, but there's no
+  58     :+#    sense in doing it if there's no need. If I ever need to do this in the future though, I'm
+  59     :+#    going to need this trick to obtain a substring using standard awk:
+  60     :+#    https://stackoverflow.com/questions/5536018/how-to-print-matched-regex-pattern-using-awk/5536342#5536342
+  61     :+#   1. Also, look into this option in gawk for testing said compatibility: 
+  62     :+#     1. `--lint` - https://www.gnu.org/software/gawk/manual/html_node/Options.html
+  63     :+#     1. `--traditional` and `--posix` - https://www.gnu.org/software/gawk/manual/html_node/Compatibility-Mode.html
+  64     :+#   1. Currently, `--lint` is telling me that the 3rd argument to `match()` (ie: the array 
+  65     :+#      parameter) is a gawk extension.
+  66     :+
   52,  67: # References:
   53,  68: # 1. This script borrows from @PFudd's script here:
   54,  69: #    https://stackoverflow.com/questions/24455377/git-diff-with-line-numbers-git-log-with-line-numbers/33249416#33249416
  133, 148: # "41", "42", etc. codes is this:
  134, 149: #       ^(\033\[(([0-9]{1,2};?){1,10})m)?
  135, 150: 
+ 151     :+# Be sure to place all args (`"$@"`) AFTER `--color=always` so that if the user passes in
+ 152     :+# `--color=never` or `--no-color` they will override my `--color=always` here, since later
+ 153     :+# options override earlier ones.
  136, 154: git diff --color=always "$@" | \
  137, 155: gawk \
  138, 156: '

Here's a screenshot to show the nice color output:

enter image description here

The original script, shown here:

git diff HEAD~..HEAD --color=always | \
  gawk '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
    match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
    bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
    {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
    bare~/^-/{print "-"left++ ":" line;next};\
    bare~/^[+]/{print "+"right++ ":" line;next};\
    {print "("left++","right++"):"line;next}'

produces pretty awful-looking (in comparison), unaligned output:

$     git diff HEAD~..HEAD --color=always | \
>       gawk '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
>         match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
>         bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
>         {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
>         bare~/^-/{print "-"left++ ":" line;next};\
>         bare~/^[+]/{print "+"right++ ":" line;next};\
>         {print "("left++","right++"):"line;next}'
diff --git a/useful_scripts/git-diffn.sh b/useful_scripts/git-diffn.sh
index 22c74e2..cf8ba08 100755
--- a/useful_scripts/git-diffn.sh
+++ b/useful_scripts/git-diffn.sh
(49,49): #   4. `git-gs_diffn`
(50,50): #   3. `gs_git-diffn`
(51,51): 
+52:+# FUTURE WORK:
+53:+# 1. Make work with standard awk?
+54:+#    This has been tested on Linux Ubuntu 18.04. If anyone can't get this working on their system,
+55:+#    such as in the git bash terminal that comes with Git for Windows, or on MacOS, due to 
+56:+#    compatibility probems with `gawk`, I can rewrite the few places relying on `gawk` extensions
+57:+#    to just use basic awk instead. That should solve any compatibility problems, but there's no
+58:+#    sense in doing it if there's no need. If I ever need to do this in the future though, I'm
+59:+#    going to need this trick to obtain a substring using standard awk:
+60:+#    https://stackoverflow.com/questions/5536018/how-to-print-matched-regex-pattern-using-awk/5536342#5536342
+61:+#   1. Also, look into this option in gawk for testing said compatibility: 
+62:+#     1. `--lint` - https://www.gnu.org/software/gawk/manual/html_node/Options.html
+63:+#     1. `--traditional` and `--posix` - https://www.gnu.org/software/gawk/manual/html_node/Compatibility-Mode.html
+64:+#   1. Currently, `--lint` is telling me that the 3rd argument to `match()` (ie: the array 
+65:+#      parameter) is a gawk extension.
+66:+
(52,67): # References:
(53,68): # 1. This script borrows from @PFudd's script here:
(54,69): #    https://stackoverflow.com/questions/24455377/git-diff-with-line-numbers-git-log-with-line-numbers/33249416#33249416
(133,148): # "41", "42", etc. codes is this:
(134,149): #       ^(\033\[(([0-9]{1,2};?){1,10})m)?
(135,150): 
+151:+# Be sure to place all args (`"$@"`) AFTER `--color=always` so that if the user passes in
+152:+# `--color=never` or `--no-color` they will override my `--color=always` here, since later
+153:+# options override earlier ones.
(136,154): git diff --color=always "$@" | \
(137,155): gawk \
(138,156): '

Screenshot:

enter image description here

2/3: making the numbers colored as well:

To answer the 2nd part of my question:

Additionally, the numbers and +/- signs should be green and red, respectively, like in normal git diff output.

I then added some ANSI color codes for red (\033[31m) and green (\033[32m) to the 3rd-from-last and 2nd-from-last lines shown below:

git diff HEAD~..HEAD --color=always | \
  gawk '{bare=$0;gsub("\033[[][0-9]*m","",bare)};\
    match(bare,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
    bare ~ /^(---|\+\+\+|[^-+ ])/{print;next};\
    {line=gensub("^(\033[[][0-9]*m)?(.)","\\2\\1",1,$0)};\
    bare~/^-/{printf   "\033[31m-%+4s     :%s\n", left++, line;next};\
    bare~/^[+]/{printf "\033[32m+%+4s     :%s\n", right++, line;next};\
    {printf                    " %+4s,%+4s:%s\n", left++, right++, line;next}'

and got this nicer-looking output. Notice the numbers at the far left are now colored too:

enter image description here

3/3: Lastly:

I then:

  1. dissected the code above for days
  2. studied awk like crazy
  3. threw away the whole code above because it
    1. was impossible to read and horribly formatted
    2. did some really weird things that didn't make any sense and weren't needed, and
    3. didn't handle any edge cases or custom git diff colors
  4. kept the really good parts that were absolutely ingenious and perfect, and
  5. wrote this really nice and full implementation of git diffn which:
    1. handles all edge cases I could think of
    2. acts as a drop-in replacement to git diff
    3. handles all colors and text formatting possible (as far as I can tell), and
    4. produces really nice output with or without color on

See here for git diffn info & installation instructions: Git diff with line numbers (Git log with line numbers)

The end.

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265