If the data is already ordered such that duplicate strings reside on successive lines (as in the example), and assuming all lines contain no white space:
$ uniq file2.txt
ANE
AHL
ANI
ANJ
ANK
ANL
ANM
ANN
ANO
ANP
ANQ
ANR
AMY
AMZ
Assuming the duplicates may not be on successive lines, assuming all lines contain no white space:
$ sort -u file2.txt
AHL
AMY
AMZ
ANE
ANI
ANJ
ANK
ANL
ANM
ANN
ANO
ANP
ANQ
ANR
Now, if the duplicates are not located on successive lines and/or white space may exist in various lines, we'll look at some ideas to address OP's current awk
code ...
The provided sample includes trailing spaces on some lines so your awk
code ...
awk '{!seen[$0]++};END{for(i in seen) if(seen[i]==1)print i}'
... which references then entire line ($0
) is going to treat ABC
and ABC
differently.
Assuming each line only has a single string then the current code should replace $0
with $1
to strip off unwanted spaces, eg:
awk '{!seen[$1]++};END{for(i in seen) if(seen[i]==1)print i}'
But this still isn't sufficient because it's looking for only those strings that show up just once (seen[i] == 1
); to print a unique list of strings consider:
awk '{!seen[$1]++};END{for(i in seen) print i}'
But if we just need a unique set of array indices then the 'not' (!
) and increment (++
) are superfluous, so we could further reduce this to:
awk '{seen[$1]};END{for(i in seen) print i}'
Now, since the order of the output doesn't appear to be a requirement we could keep the 'not' (!
) and increment (++
) and eliminate the END{}
block; instead we'll print a string the first time we see it and then ignore it for the rest of the script:
awk '!seen[$1]++' file2.txt
This generates:
ANE
AHL
ANI
ANJ
ANK
ANL
ANM
ANN
ANO
ANP
ANQ
ANR
AMY
AMZ