1

The following will search commits for changes involving "foo":

$ git log -S foo

How can I:

  • search for changes involving foo and bar
  • ensure that the change occurred in a file containing both patterns, even if only one pattern was involved in the change
  • include commits from archived and non-archived branches
Partial solution based on answer by @LeGEC
#!/bin/bash

cd $GITDIR

declare -a myhashes=($(git log --all --pretty=%H))

for myhash in "${myhashes[@]}";do
    declare -a myfiles=($(git diff --pickaxe-regex -S "foo|bar" --name-only $myhash))
    echo found "${#myfiles[@]}" files starting with ${myfiles[1]}

    # Files with Foo
    declare -a fwf=($(git grep -l -e foo $myhash -- $myfiles))
    # Files with Bar
    declare -a fwb=($(git grep -l -e bar $myhash -- $myfiles))

    # how to intersect fwf,fwb?
done

mkk
  • 879
  • 6
  • 19
  • I need some advice on `intersect(fwf,fwb)`. I tried a method suggested in this answer, but the results were very strange (I'm not a regex expert yet): https://stackoverflow.com/a/22439016/1798351 – mkk Sep 15 '21 at 18:41
  • 1
    I think you should turn the "how to do this intersection" into a separate question, probably including in it a link to this original question and a note that, yes, this is a bit of an XY problem and you're more interested in X than Y, but now you do have this here Y to solve, meanwhile consider X too. :-) – torek Sep 16 '21 at 01:18
  • 1
    As for using `comm` to implement intersection: comm is fine at doing this but it needs its inputs to be collated, which might be a problem. I haven't thought much about solving Y here. Note that `comm` does not do regular expressions at all, it's just searching for lines that match, or don't. – torek Sep 16 '21 at 01:19
  • @torek, those are insightful comments. In general I agree that stackoverflow posts should address problems that are atomic in nature. However I also think that, especially for beginners, posts that address X,Y problems can provide insight into how a modular solution should be constructed, which is an essential skill in the real world. – mkk Sep 16 '21 at 13:28

1 Answers1

2
  • search for changes involving foo and bar :

If you add --pickaxe-regex, the argumet to -S will be treated as a regexp :

git log --pickaxe-regex -S "foo|bar"

(see "a note about -S" below)

  • search for changes involving foo and bar :

you can use git log to list all potential commits, and then refine from there (see "About your second point" below)

  • include commits from archived and non-archived branches :

simply add --all to git log :

git log --all --pickaxe-regex -S "foo|bar"

About your first point :
a note about -S :

-S spots commits that change the number of lines matching the pattern. So using -S "foo|bar" (with regexes on) would overlook a commit where one line containing foo is turned into one line containing bar.

If that's not what you wish, you may be looking for -G, or you may want to make something out of the output of the two commands git log -S foo and git log -S bar.


About your second point :

if you add --pretty=%H to your git log command, you will have, as an output, only a list of hashes, for all the commits that may interest you.

To list the files within those commits that may interest you, you may either add --name-only to the git log command, or take these commits one by one, and re-run them through git diff --pickaxe-regex -S "foo|bar" --name-only <sha>.

Once you have a list of target commits, and a list of file names for all commits, you can check the content of each file within its target commit to see if it has both foo and bar within its content.

You can for example use git grep -l -e foo <sha> -- <list of files> and git grep -l -e bar <sha> -- <list of files> and combine the outputs to see what files contain both patterns.

You may also want to check the content of each file before the commit ; e.g : do you want to keep a file where foo was changed to bar ?
If such is the case, file could contain only bar in the target commit (<sha>), and could contain only foo in the parent commit : <sha>^ :

# you may want to check the content of target files in the parent commit :
git grep ... <sha>^

You will need some scripting on top of those git commands to get a complete solution.

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • thanks for the thoughtful answer, @LeGEC. I've updated the OP with a partial implementation. Any chance you could offer a suggestion to intersect in bash (some other stack solutions i tried eg. `comm` did not work)? – mkk Sep 15 '21 at 18:38
  • `comm` expects files as input, not variables. You can turn a variable into a "file" using : `comm -12 <(echo $fwf) <(echo $fwb)` – LeGEC Sep 15 '21 at 19:06
  • also : you can add `--pickaxe-regex -S "foo|bar"` to your initial `git log` command, this should greatly reduce the size of `myhashes` array -- and probably speed up the overall execution time – LeGEC Sep 15 '21 at 19:12
  • (to intersect in bash) : another option is to use grep : `echo $fwf | grep -Ff <(echo $fwb)` – LeGEC Sep 15 '21 at 19:17