0

I have a dataset where i need to search for the 2 variables in it. Both vars should be present, otherwise ignore them.

inputfile.txt:

IFRA-SCN-01001B.brz.com Tower Sales
IFRA-SCN-01001B.brz.com Z$
IFRA-SCN-01001B.brz.com Pre-code$
IFRA-SCN-01001B.brz.com Technical Stuff
IFRA-SCN-01001B.brz.com expired$
IFRA-SCN-01001B.brz.com AA$
IFRA-SCN-01002B.brz.com Build Docs
IFRA-SCN-01002B.brz.com Build Docs

BigFile.txt:

\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn@brz.com
\\IFRA-SCN-01001B.brz.com\ABC PTR,John.Mayn@brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\DOC TRIGGER,Peter.Salez@brz.com
\\IFRA-SCN-01001B.brz.com\bitshare,Peter.Salez@brz.com
\\IFRA-SCN-01001B.brz.com\bitshare\PFM FRAUD,Peter.Salez@brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz@brz.com
\\IFRA-SCN-01001B.brz.com\Build Docs,Arlan.Boynoz@brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz@brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz@brz.com

it is working if i use the actual string but not if assigned to a variable.

[root@brzmgmt]$ awk '/Build Docs/{ok=1;s=NR}ok && NR<=s+2 && /IFRA-SCN-01002B.brz.com/{print $0}' BigFile.txt
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz@brz.com
\\IFRA-SCN-01002B.brz.com\Build Docs,Arlan.Boynoz@brz.com


while read -r zz; do
        var1=`echo $zz | print '{print $1}'`
        var2=`echo $zz | print '{print $2}'`
        awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
        awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
        fi
done < inputfile.txt

any idea what am i missing?

awk '/$var2/{ok=1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 -v b=$var2 '/$b/{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
gafm
  • 351
  • 1
  • 9
  • `/.../` expects regex. You can use `match($0,"...")` for fixed string. – jhnc Jul 15 '22 at 22:14
  • Are the values supposed to be on same line? Or anywhere in file? Or something else? Why do you test `NR<=s+2` ? – jhnc Jul 15 '22 at 22:18
  • In `awk '/$var2/....'` you are missing that shell and awk variables are very different and that shell variables do not expand when placed between single-quotes. In `awk -v a=$var1 -v b=$var2 '/$b/...'` it looks like you are missing that `$b` would seek the **field** number represented by `b`. Rather you want `b` instead of `$b`. Also `echo $zz | print '{print $1}'` makes no sense. `echo` is fine, but `print '{print $1}'` is confusing at best. – David C. Rankin Jul 16 '22 at 01:22
  • You shouldn't be using a shell `while read` loop at all, you can/should do what you want in a single call to awk. See [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). After accepting an answer to the question you asked about what's wrong with your syntax, if you ask a new question about how to improve your script then we can help you with that. – Ed Morton Jul 16 '22 at 12:34
  • @EdMorton, apologies for not having a good script. wasn't really my forte, but helping myself day by day to improve. – gafm Jul 18 '22 at 10:15
  • No need to apologize, we're all learning every day, I'm just giving you a heads up about some issues beyond those you specifically asked about so you can follow up on them if you like. – Ed Morton Jul 18 '22 at 10:20

2 Answers2

3

I see a number of problems here. First, where you split the fields from inputfile.txt with

while read -r zz; do
    var1=`echo $zz | print '{print $1}'`
    var2=`echo $zz | print '{print $2}'`

When the line is something like "IFRA-SCN-01002B.brz.com Build Docs", var1 will be set correctly, but var2 will only get "Build", not "Build Docs". I assume you want the latter? If so, I'd let read do the splitting for you:

while read -r var1 var2

...which will automatically include any "extra fields" (e.g. "Docs") in the last variable. If you don't want the full remainder of the line, just add an extra variable to hold anything beyond the second field:

while read -r var1 var2 ignoredstuff

See BashFAQ #1: How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

As for the awk commands, the first one doesn't work because the shell doesn't expand variables inside single-quotes. You could switch to double-quotes, but then you'd have to escape $0 to keep the shell from expanding that, and you'd also have to worry about the search strings possibly including awk syntax, and it's generally a mess. The second method, with -v, is a lot better, but you still have to fix a couple of things.

In the -v a=$var1 b=$var2 part, you should double-quote the variables so the shell doesn't split them if they contain spaces (like "Build Docs"): -v a="$var1" -v b="$var2". You should pretty much always double-quote variable references to prevent problems like this.

Also, the way you use those a and b variables in the awk command isn't right. $ in awk doesn't mean "substitute a variable" like it does in shell, it generally means "get a field by number" (e.g. $2 gets the second field, and $(x+2) gets the x-plus-second field). Also, in a /.../ pattern, variables (and field references) don't get substituted anyway. So what you probably want instead of /$a and /$b/ is $0~a and $0~b (note that ~ is awk's regex match operator).

So the command should be something like this:

awk -v a="$var1" -v b="$var2" '$0~b{ok=1;s=NR}ok && NR<=s+2 && $0~a{print $0}' BigFile.txt

Except... you might not want that, because it treats the strings as regular expressions rather than plain strings. So the . characters in "IFRA-SCN-01001B.brz.com" will match any single character, and the $ in "Pre-code$" will be treated as an end-of-string anchor rather than a literal character. If you just want them matched as literal strings, use e.g. index($0,b) instead:

awk -v a="$var1" -v b="$var2" 'index($0,b){ok=1;s=NR}ok && NR<=s+2 && index($0,a){print $0}' BigFile.txt

I'd also recommend running your scripts through shellcheck.net to catch common mistakes and bad practices.

Finally, I have to ask what's up with all the ok and s stuff. That looks like it's going to insert some weird inter-record dependencies that don't make any sense. Also, if the fields are always going to be in that same order, would a grep search be simpler?

Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
  • I've taken and read all your suggestion. I've been really practicing the use of awk for my data manipulation. I know it can be done using `grep` but since there is other stuff that I need to insert hence the use of `awk`. is it safe to say that in case I have a variable to use, I still need to redefine it again as part of the awk command? – gafm Jul 18 '22 at 10:28
  • @gafm There are other ways to access shell variables in `awk`, but copying them into `awk` variables with `-v awkvar="$shellvar"` is almost always the best. See ["How do I use shell variables in an awk script?"](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script) – Gordon Davisson Jul 18 '22 at 17:18
3

Your code is:

while read -r zz; do
        var1=`echo $zz | print '{print $1}'`
        var2=`echo $zz | print '{print $2}'`
        awk '/$var2/{ok=awk1;s=NR}ok && NR<=s+2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
        awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s+2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
        fi
done < inputfile.txt

I won't address the logic of the code (eg. you should probably be using match($0,"...") instead of /.../, and I don't know what the test NR<=s+2 is for) but here are some syntax and efficiency issues:

  • You appear to want to read a line of whitespace-delimited text into two variables. This is more simply done with just: read -r var1 var2 or read -r var1 var2 junk
  • print is not a standard shell command. Perhaps this is meant to be an awk script (awk '{print $1}', etc)? But just use simple read instead.
  • Single-quotes prevent variable expansion so, inside the script argument passed to awk, /$var/ will literally look for dollar, v, a, r. Pass variables using awk's -v option as you do in the second awk line.
  • Each variable passed to awk needs a separate -v option.
  • awk does not use $name to reference variable values, simply name. To use a variable as a regex, just use it in the right place: eg. $0 ~ name.

So:

while read -r var1 var2 junk; do
    # quote variables to prevent globbing, word-splitting, etc
    awk -v a="$var1" -v "$var2" '
        $0 ~ var2 { ok=1; s=NR }
        ok && NR<=s+2 && $0 ~ var1 ; # print is default action
    ' BigFile.txt
done <inputfile.txt

Note that the more var1/var2 you want to check, the longer the runtime ( O(mn) : m sets of var1/var2 and n lines of input to check ). There may be more efficient algorithms if the problem is better-specified.

jhnc
  • 11,310
  • 1
  • 9
  • 26
  • thanks for pointing that out. I'll be very specific with my requirement next time. But thank you. Had re-written my script and had taken your suggestion. – gafm Jul 18 '22 at 10:19