3

I am using awk in a bash script to compare two files to get just the not-matching lines. I need to compare all three fields of the second file (as one pattern?) with all lines of the first file:

First file:

chr1    9997    10330   HumanGM18558_peak_1     150     .       10.78887        18.86368        15.08777        100
chr1    628885  635117  HumanGM18558_peak_2     2509    .       83.77238        255.95094       250.99944       5270
chr1    15966215        15966638        HumanGM18558_peak_3    81      .       7.61567 11.78841        8.17169 200

Second file:

chr1 628885 635117
chr1 1250086 1250413
chr1 16613629 16613934
chr1 16644496 16644800
chr1 16895871 16896489
chr1 16905126 16905616

The current idea is to load one file in an array and use AWKs negative regular expression to compare.

readarray a < file2.txt
for i in "${a[@]}"; do
awk -v var="$i" '!/var/' file1.narrowPeak | cat > output.narrowPeak
done

The problem is that '!/var/' is not working with variables.

  • 1
    Please add your desired output (no description) for that sample input to your question (no comment). – Cyrus Jul 24 '20 at 13:52
  • 3
    on closer look at https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script, there's only one comment on how to use variables as regexp and not a hint for negative regexp in that question.. `$0 !~ var` is what you are looking for based on question title, but there's far better solution using just awk instead of bash+awk – Sundeep Jul 24 '20 at 13:52
  • 1
    Why not `grep -v "$i"` – Digvijay S Jul 24 '20 at 13:53
  • 3
    Also using shell loop to process text is bad idea. Check https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Digvijay S Jul 24 '20 at 13:56
  • 8
    Piping to `cat` is a new [UUOC](http://porkmail.org/era/unix/award.html) on me! – Ed Morton Jul 24 '20 at 13:56

2 Answers2

4

With awk alone:

$ awk 'NR==FNR{a[$1,$2,$3]; next} !(($1,$2,$3) in a)' file2 file1
chr1    9997    10330   HumanGM18558_peak_1     150     .       10.78887        18.86368        15.08777        100
chr1    15966215        15966638        HumanGM18558_peak_3    81      .       7.61567 11.78841        8.17169 200
  • NR==FNR this will be true only for the first file, which is file2 in this example
  • a[$1,$2,$3] create keys based on first three fields, if spacing is exactly same between the two files, you can simply use $0 instead of $1,$2,$3
  • next to skip remaining commands and process next line of input
  • ($1,$2,$3) in a to check if first three fields of file1 is present as key in array a. Then invert the condition.

Here's another way to write it (thanks to Ed Morton)

awk '{key=$1 FS $2 FS $3} NR==FNR{a[key]; next} !(key in a)' file2 file1
Sundeep
  • 23,246
  • 2
  • 28
  • 103
3

When the pattern is stored in a variable, you have to use the match operator:

awk -v var="something" '
  $0 !~ var {print "this line does not match the pattern"}
'

With this problem, regular expression matching looks a bit awkward. I'd go with Sundeep's solution, but if you really want regex:

awk '
  NR == FNR {
    # construct and store the regex
    patt["^" $1 "[[:blank:]]+" $2 "[[:blank:]]+" $3 + "[[:blank:]]"] = 1
    next
  }
  {
    for (p in patt)
      if ($0 ~ p)
        next
    print
  }
' second first
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • 1
    That 2nd script would be clearer if you replaced `patt` and `p` with `regexps` and `r` since there are no `patterns` in text matching, just `regexps` or `strings`. You need a `"$"` at the end of each regexp where you populate the array to avoid false matches. You don't need the `= 1`. – Ed Morton Jul 24 '20 at 14:45
  • 1
    Right, but not `$`, I'd want a space (or some separator character). – glenn jackman Jul 24 '20 at 15:08
  • 1
    I like to have an assignment there, personal style. Agree to disagree about use of "pattern". – glenn jackman Jul 24 '20 at 15:11
  • 1
    Fair enough. For me seeing code that says `pattern` is like applying for a job cleaning cages at the zoo and all they'll tell you is you'll be in the cage with `animals`. Personally I'd like to know if it'll be rabbits or tigers but YMMV I suppose :-). – Ed Morton Jul 24 '20 at 15:13