27

Here's an awk script that attempts to set difference of two files based on their first column:

BEGIN{
    OFS=FS="\t"
    file = ARGV[1]
    while (getline < file)
        Contained[$1] = $1
    delete ARGV[1]
    }
$1 not in Contained{
    print $0
}

Here is TestFileA:

cat
dog
frog

Here is TestFileB:

ee
cat
dog
frog

However, when I run the following command:

gawk -f Diff.awk TestFileA TestFileB

I get the output just as if the script had contained "in":

cat
dog
frog

While I am uncertain about whether "not in" is correct syntax for my intent, I'm very curious about why it behaves exactly the same way as when I wrote "in".

merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • I also couldn't find any doc about "not in", so I agree that it is not the correct syntax for my original intent, although that wasn't the actual question. – merlin2011 Jun 07 '12 at 01:44
  • https://www.gnu.org/software/gawk/manual/html_node/Reference-to-Elements.html -- this page refers to the question exactly, and the documentation explicitly indicates `indx in array` returns `true` or `0` (false) if the `indx` is in or not in the array. So to negate that, use the `!` operator. If this doesn't work, it's time to verify whether you are using `gnu awk` (which the documentation references) or `mawk` (which is typically the default). – Chris Aug 23 '23 at 21:53

5 Answers5

36

I cannot find any doc about element not in array.

Try !(element in array).


I guess: awk sees not as an uninitialized variable, so not is evaluated as an empty string.

$1 not == $1 "" == $1
kev
  • 155,172
  • 47
  • 273
  • 272
  • I can't tell from your code what you are trying to do and even if I take out the wayward 'not' bareword I still get syntax errors. Try 'awk --lint -f yourfile.awk yourdatafile – starbolin Jun 07 '12 at 00:19
  • 1
    @starbolin: I think you meant for you comment to be attached to the question since it doesn't make any sense attached here. You shouldn't get any syntax errors since there's nothing (else) wrong with the script. – Dennis Williamson Jun 07 '12 at 01:43
25

I figured this one out. The ( x in array ) returns a value, so to do "not in array", you have to do this:

if ( x in array == 0 )
   print "x is not in the array"

or in your example:

($1 in Contained == 0){
   print $0
}
karthikr
  • 97,368
  • 26
  • 197
  • 188
Jeff
  • 251
  • 3
  • 2
2

In my solution for this problem I use the following if-else statement:

if($1 in contained);else{print "Here goes your code for \"not in\""}
TheEsnSiavashi
  • 1,245
  • 1
  • 14
  • 29
Peter
  • 21
  • 1
1

Not sure if this is anything like you were trying to do.

#! /bin/awk
# will read in the  second arg file and make a hash of the token
# found in column one. Then it will read the first arg file and print any
# lines with a token in column one not matching the tokens already defined
BEGIN{
    OFS=FS="\t"
    file = ARGV[1]
    while (getline  &lt file)
        Contained[$1] = $1
#    delete ARGV[1]  # I don't know what you were thinking here
#    for(i in Contained) {print Contained[i]} # debuging, not just for sadists
    close (ARGV[1])
}
{
   if ($1 in  Contained){} else { print $1 }
}

starbolin
  • 840
  • 5
  • 13
0

In awk commande line I use:

 ! ($1 in a)
$1 pattern
a array

Example:

awk 'NR==FNR{a[$1];next}! ($1 in a) {print $1}' file1 file2
Shiny
  • 4,945
  • 3
  • 17
  • 33