0

I have a file as below

cat foo.txt
N N
N N
N N
N N
I-MB I-MB

I want to output those lines which 1st column is not equal 2nd column, so I use awk to implement it

cat foo.txt | awk '$1 != $2'
N N
N N
N N

but very strangely it does not work.

The reason is that it is generated by windows

file foo.txt
foo.txt: ASCII text, with CRLF, LF line terminators

After converting it to unix form it works.

sed -e 's/^M$//' foo.txt > foo2.txt
file foo2.txt
foo2.txt: ASCII text

So why CRLF could affect some awk functions but other not? e.g.

head foo.txt | awk '$1 !~ /N/'
I-MB I-MB

I-MB I-MB
zhuguowei
  • 8,401
  • 16
  • 70
  • 106

1 Answers1

1

All awk functions are completely unaffected, they're working exactly as designed. The point you're missing is that when your input line is (CR=\r and LF=\n):

N N\r\n

and your RS value is the UNIX default \n, the $0 string being processed within awk is:

N N\r

so $2 (N\r) is simply not equal to $1 (N).

If you set RS="\r\n" (gawk-only for multi-char RS) then $0 would be:

N N

and then obviously $2 is equal to $1 but the usual advice is to just run dos2unix or similar on your input file before running any UNIX tools on it.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185