3

I cannot understand why the float number comparison does not work in mawk:

mawk '$3 > 10' file.txt
[...]
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_7_F   3196    3.68367
9_9_F   2278    2.37445
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775
[...]

While it does perfectly on awk like that:

awk '{if ($3 > 10) print $1}' file.txt

I'm obviously doing something wrong here, but I cannot understand what.

Tom S
  • 35
  • 3
  • both code are not realy the same, even if they should do the same (test is at pattern level for the first and inside the action for a pattern matched in the second). do you try the awk version with mawk ? – NeronLeVelu Dec 16 '16 at 09:03
  • I tried mawk version of the last command: `mawk '{if ($3 > 10) print $3}' file.txt`, the result is the same, values lower than 10 are not filtered out, e.g.: `17.9344; 3.68367; 2.37445; 163.557; 172.775`. Might that have something to do with uneven length of the fractional parts in my values? – Tom S Dec 16 '16 at 11:54

2 Answers2

4

It fails if the file has CRLF line terminators. Remove the \r first:

$ file foo
foo: ASCII text, with CRLF line terminators
$ mawk 'sub(/\r/,"") && ($3 > 10)'  foo
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775

Alternatively you could use dos2unix or such.

EDIT2: If you are using locale that has comma as decimal separator, it affects float comparisons in mawk.

In this case you can either:

1) set locale to

LANG="en_US.UTF-8"

or

2) change decimal separators to commas and pipe it to mawk:

mawk '$3 > 10' <(cat file.txt | sed -e "s/\./,/")
Tom S
  • 35
  • 3
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • `dos2unix foo && mawk '$3 > 10' foo` still gives the same result. Also `mawk 'sub(/\r/,"")'` gives no result whatsoever (output is empty). The result of `file foo` is `foo: ASCII text` – Tom S Dec 16 '16 at 11:51
  • 2
    Which locale are you on (type `locale`) ? Maybe your locale uses another decimal separator, comma for example. – James Brown Dec 16 '16 at 11:56
  • 2
    You are right! It's locale problem. It works perfectly after I change periods to commas: `mawk '$3 > 10' <(cat out.idepth | sed -e "s/\./,/")`. Good to remember that mawk uses locale info while awk does not. – Tom S Dec 16 '16 at 12:08
  • I tried to change my locale to something with comma as separator but still couldn't reproduce. Which locale are you using? – James Brown Dec 16 '16 at 12:09
  • 1
    It's `"pl_PL.UTF-8"`. – Tom S Dec 16 '16 at 12:16
  • 2
    Setting locale to `LANG="en_US.UTF-8"` solves the problem and `mawk '$3 > 10' foo` works just like it should. – Tom S Dec 16 '16 at 12:21
0

You don't need to set locale, but need to account for strange or errorneous input :

If the input has a dot, or any character than has a byte ordinance higher than ASCII "1" (which is a LOT of stuff) :

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  :5.333

this would completely fail to produce the correct result, since $3 is being compared as a string, where an ASCII "9" is larger than ASCII "1" :

mawk2 'sub("\r*",_)*(10<$3)'

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  9.
9_annua_M-merg  122663  :5.333

To rectify it, simply add + next to $3 :

mawk 'sub("\r*",_)*(10<+$3)'

If you don't care much for archaic gawk -P/-c/-t modes then it's even simpler :

mawk '10<+$3' RS='\r?\n'

Let ORS take care of the \r::CR on your behalf. By placing the ? at the RS regex, you can skip all the steps about using iconv or dos2unix or changing locale settings ::

  • RS—-->ORS would seamlessly handle it

This way the original input file remains intact, in case you need those CRs later for some reason.

RARE Kpop Manifesto
  • 2,453
  • 3
  • 11