1

I am writing a script that accepts as an argument a string. I want to run a particular command, check the output of that command for that input string, first returning lines that match on <input string>$ and only if that does not return any lines, then return all lines that contain <input string> anywhere in the line. I am currently using grep -E but am open to awk or sed.

Consider this output written to file:

> cat command.out
A
A1
B
B1
B2
C1
C2
C3
XYZ
XYZ1
XYZ2

If my input string is 'B' then I want to return

B

not

B
B1
B2

If my input string is 'C' then I want to return

C1
C2
C3

If my input string is 'Z' then I want to return

XYZ

If my input string is 'Y' then I want to return

XYZ
XYZ1
XYZ2

Using a | (or) in the pattern doesn't do what I am after as it would return all lines with B.

What I have works but seems inefficient and I suspect there is a better way.

> command_output="$(cat command.out)"

> matches="$( (print "$command_output"|grep -E 'B$')||(print "$command_output"|grep -E 'B') )"
> print "$matches"
B

> matches="$( (print "$command_output"|grep -E 'C$')||(print "$command_output"|grep -E 'C') )"
> print "$matches"
C1
C2
C3

I have to persist the command output and fire off potentially two greps. I was hoping for a piped one-liner

matches="$(<run command>|grep <first pattern, if no match, second pattern>)"

but maybe that is not possible.

Shane
  • 21
  • 4
  • You say "pattern" in your question - do you mean "string" or "regexp"? See [how-do-i-find-the-text-that-matches-a-pattern](https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern). You state what should happen if the target "pattern" occurs at the end of a line or is the whole line but what if it appears mid-line, e.g. you want to match `D` and only `aDb` exists in the input? Also, should `fooB` really be treated the same as `B` alone? – Ed Morton May 09 '23 at 11:34
  • Thank you @EdMorton. The use of 'pattern' was deliberate to be less prescriptive. If my requirement can be accomplished with regex, great! If regex is not necessary and a simple pattern will do, that is fine too. I will try to clarify the question per your guidance. – Shane May 12 '23 at 15:37
  • Given a few lines of sample input/output we can't tell you if your requirements can be solved with a partial regexp match if in your head you know you need a full string match. The issue is that we can't know what your requirements is until you TELL us if you want to match a string or regexp, and full or partial, and whether you can guarantee your "pattern" will never contain regexp metachars and/or never be a substring of another string in the input, etc. Until you specifically define your requirements in terms of strings/regexps and full/partial, we're just guessing at what you might need. – Ed Morton May 12 '23 at 17:48

3 Answers3

2
matches=$(
    <run command> |
    awk '
        $0~r"$" && exact=1;
        !exact && $0~r { inexact[n++] = $0 }
        END {
            if(!exact)
                for(i=0;i<n;i++)
                    print inexact[i]
        }
    ' r='regex'
)
  • $ is concatenated to the value of r to form a regex anchored to end of line. If $0 matches this:
    • set flag exact
    • result is non-zero / true, so print line
  • if exact has not been set and $0 matches r anywhere:
    • append the line to array inexact
  • at end, if exact is unset (ie. no exact match was found), print any stored lines

Note that the value passed in is used as a regex. This corresponds to the grep usage in the question. To match an exact string, rather than a regex, remember to escape any regex metacharacters.


An alternative approach using exact string comparison rather than regex (and accumulating to a string rather than an array):

matches=$(
    <run command> |
    awk -v s='input string' '
        BEGIN { len=length(s) }
        idx=index($0,s) {
            if ( idx+len > length ) {
                print
                exact=1
            } else approx = approx $0 ORS
        }
        END {
            ORS=""
            if (!exact) print approx
        }
    '
)

We know the length of the string and the input line. When the string appears in the line, the position + length of the string will be longer than the line length only if it appears at the end:

s=SSS
                  idx+len     length
1234567SSS123 -->   8+3    <  13
1234567SSS1   -->   8+3    == 11
1234567SSS    -->   8+3    >  10
jhnc
  • 11,310
  • 1
  • 9
  • 26
  • The alternative approach will fail if the string appears twice on a line: eg. `12SSS67SSS` – jhnc May 11 '23 at 07:55
  • This ticks all the boxes. Thank you! – Shane May 12 '23 at 16:08
  • this version works if string can appear multiple times on line but has to scan most lines twice (as does the regex version in the answer): `...|awk -v s='input string' 'BEGIN{d=length(s)-1} substr($0,length-d)==s{print;x=1;next} index($0,s){a= a $0 ORS} END{ORS="";if(!x)print a}' ` – jhnc May 12 '23 at 20:52
1

Use the || operator in the shell to run grep twice, first matching the input as a whole line, then as a prefix.

matches=$(printf "%s\n" "$command_output" |  grep -x B filename || printf "%s\n" "$command_output" | grep '^B' filename)
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Interesting and thank you! You are using fewer subshells but it's pretty similar to my solution in the original post. I was hoping to avoid persisting the command output in a variable and if possible running grep twice. – Shane May 12 '23 at 15:53
1

If I'm understanding your question correctly, you expect the matches to be grouped together so that the first line in the group is always either the string alone or else the string with a suffix, and there will be no other matches subsequently in the file. Under those conditions, simply fall back to printing if the prefix matches if there isn't an exact match.

<run command> |
awk -v value="$1" '$0 == value { print; matched=1 }
  ($0 ~ "^" value) && !matched'
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thank you @triplee but that doesn't quite produce the expected results and I can't guarantee the assumptions you listed. I tested the cases I outlined in the original post and it did not produce the expected result – Shane May 12 '23 at 15:56