0

I've made several regexes for diff and log to define what a word is. They are quite long and complex, and of course very because sometimes I want a word to be defined one way under certain circumstances, and another way in others. So I define some aliases to hide the complexity, named diff1, diff2, diff3, ... and log1, log2, log3, .... diff1 uses the same regex as log1, diff2 same as log2, etc... Also, the regex for 1, 2, 3, etc... can be composed of smaller regexes which they all share.

I would like to minimize the amount of copy coding because these are a bit experimental and I'm updating them every so often, so using variables are a logical conclusion, not to mention, it would make the regexes far more readable.

Does .gitconfig support some variable/replacement mechanism? I couldn't find anything in the man page and this question would appear to indicate that it's not available either, but I just wanted to make sure before I give up or try another tact.

Example .gitconfig file:

[alias]
    #                                                                    1                           2                      3      4               5      6          7           8           9          10               11   12
    diff2 = diff --color=always --ignore-space-change '--word-diff-regex=((\\r\\n?|\\n\\r?)[\\t ]*)?([a-zA-Z_][a-zA-Z_0-9]*|0([xX]([0-9][a-fA-F])+|[0-7]+|[bB][01]+)|[1-9][0-9]*(\\.[0-9]+)?([eE][0-9]+|[pP][0-9a-fA-F])?|\\S)(\\r\\n?|\\n\\r?)?' -p
    #  1. Begining of the line whitespace can be thought of as a word
    #  2. A word starts with a letter and is followed by 0 or more letters/numbers/underscores
    #  3. A word (hex, octal or binary number) starts with a 0
    #  4. A Word (hex) continues with an 'x' followed by 1 or more chars in [0-9a-fA-F] class.
    #  5. A word (octal) continues with 1 or more chars in [0-7] class.
    #  6. A word (binary) continues with 1 or more chars in [01] class.
    #  7. A word (integer or decimal) starts with [1-9] and has 0 or more [0-9] chars after it.
    #  8. A word (floating) continues with a '.' followed by 1 or more [0-9] chars after it.
    #  9. A word (floating) can continue with an integer exponent.
    # 10. A word (floating) can continue with a hex exponent.
    # 11. A word can be any non-whitespace character.
    # 12. A word can be all above with a newline after it.

Would be nicer if I could break this down. Like:

[alias]
    # Beginning_of_line:              (\\r\\n?|\\n\\r?)[\\t ]*)?
    # User_defined_literal:           ([a-zA-Z_][a-zA-Z_0-9]*)
    # Nondecimal_number:              0([xX]([0-9][a-fA-F])+|[0-7]+|[bB][01]+)
    # Decimal_number:                 [1-9][0-9]*(\\.[0-9]+)?([eE][0-9]+|[pP][0-9a-fA-F])?
    # Single_nonwhitespace_character: \\S
    # End_of_line:                    (\\r\\n?|\\n\\r?)?
    diff2 = diff --color=always --ignore-space-change '--word-diff-regex='%Beginning_of_line%(%User_defined_literal%|%Nondecimal_number%|%Decimal_number%|%Single_nonwhitespace_character%)%End_of_line%' -p
    pickaxe2 = log -p --color=always --ignore-space-change '--word-diff-regex='%Beginning_of_line%(%User_defined_literal%|%Nondecimal_number%|%Decimal_number%|%Single_nonwhitespace_character%)%End_of_line%' -s

    diff3 = diff --color=always --ignore-space-change '--word-diff-regex='(%User_defined_literal%|%Nondecimal_number%|%Decimal_number%|%Single_nonwhitespace_character%)%End_of_line%' -p
    pickaxe3 = log -p --color=always --ignore-space-change '--word-diff-regex='(%User_defined_literal%|%Nondecimal_number%|%Decimal_number%|%Single_nonwhitespace_character%)%End_of_line%' -s
Adrian
  • 10,246
  • 4
  • 44
  • 110
  • There's no indirection mechanism within `git config` itself, but if you're writing your own Git commands, you can write your own indirection. For instance, for `git xyzzy` you might fetch`xyzzy.config` and use each of its values as a key for another `git config --get`. This is similar to how `git log --pretty=` invokes `git config --get log.pretty.`. – torek Apr 25 '22 at 20:42
  • Hi @torek, maybe you could post an answer with an example? – Adrian Apr 25 '22 at 21:01
  • It would be easier if you posted a few examples of your existing aliases, so that I don't have to guess what they might look like... – torek Apr 25 '22 at 23:05

1 Answers1

0

You can stick all of this into an alias expression, but it's definitely easier to write a shell (sh or bash, typically) script or Python program, which lets you grab your own configuration settings where you can define each of these.

Here's an example, using your example as a starting point:

# global git config
[adrian "regex"]
     bol = (\\r\\n?|\\n\\r?)[\\t ]*)?
     ud-literal = ([a-zA-Z_][a-zA-Z_0-9]*)
     nondec-number = 0([xX]([0-9][a-fA-F])+|[0-7]+|[bB][01]+)
... snip ...

[adrian "style"]
     diff2 = -p --color=always --ignore-space-change --word-diff-regex=%INTERPOLATE%(%bol%(%ud-literal%|%nondec-number%|...))
... snip ...

Then, in your alias section:

[alias]
    diff2 = !git adriandiff --style=diff2

The harder part is writing git adriandiff. It can be done in shell, but Python has easier string manipulation, so we might prefer that:

#! /usr/bin/env python3

import argparse
...

We'll demand a --style option (and do whatever with any other options, perhaps) and then run, from this script, a function that looks for %INTERPOLATE% and balanced parentheses. It uses another function to read the interpolated words into a dictionary:

def do_interpolate(s: string) -> string:
    """Take the parenthesized string, e.g., (%bol%foo%eol%),
    and put each of the %...% delimited words into a lookup
    table and then use `git config --get adrian.regex.<whatever>` to
    get the setting."""
    ... code ...
    return result_of_interpolation

Last, we drop the style into a git config --get adrian.style.<word> to get the command to run. We find any %INTERPOLATE% instruction, pass the parenthesized argument to do_interpolate, splice in the resulting string, and finally run the resulting git diff command.

There's still plenty of code to write, but now you have the ability to do as many levels of indirection as you like. I've used the [adrian] namespace here in the config examples, since Git is unlikely ever to steal that namespace from you.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Ok, so `[adrian "regex"]` and `[adrian "style"]` are sections `adrian` with a subsection `regex` and `style`. The `adriandiff` in `!git adriandiff --style=diff2` looks to be a call to an alias, which you don't define, but I'm guessing that alias runs the python code you are referring to, and `--style=diff2` gets interpreted by the python code? – Adrian Apr 28 '22 at 02:59
  • I guess the python code does a system call to git with the actual expanded parameters? Though if that is right, I'm a bit confused as to how it gets the `diff2` command in `[adrian "style"]` and how it pulls the values in `[adrian "regex"]` into a map. Are you parsing it somehow? Am I understanding your explanation correctly? – Adrian Apr 28 '22 at 03:02
  • I didn't write enough code to show it, but yes, the idea is to use `subprocess.Popen` here to open and read Git commands (`git config --get ...`). You could also use a Python library that reads Git config files directly, if you choose to go with Python, but I didn't want to get into those kinds of weeds here. – torek Apr 28 '22 at 06:49
  • kk. I'll see if I can get this to work. Thx. Why does git make it difficult to do this though? Are they afraid of self referencing variables (direct or indirect)? If so, then just disallow it and leave it at that. No need to throw the baby out with the bath water. – Adrian Apr 28 '22 at 11:35
  • 1
    It's just never been put in. It would need a defined syntax and rules. You could clone the Git source and set it up as a feature, and submit it to the Git project. – torek Apr 28 '22 at 17:57