1

I noticed an odd behavior while populating an array in awk. The indices and value both were numbers, so adding 0 shouldn’t have impacted. For the sake of understanding, lets take the following example:

Here is a file that I wish to use for this demo:

$ cat file
2.60E5-2670161065730303122012098 Invnum987678
2.60E5-2670161065846403042011098 Invnum987912
2.60E5-2670161065916903012012075 Invnum987654
2.60E5-2670161066813503042011075 Invnum987322
2.60E5-2670161066835008092012075 Invnum987323
2.60E5-2670161067040701122012075 Invnum987324
2.60E5-2670161067106602122010074 Invnum987325

What I would like to do is create an index from $1 and assign it value from $2. I will extract pieces of value from $1 and $2 using substr function.

$ awk '{p=substr($1,12)+0; A[p]=substr($2,7)+0;next}END{for(x in A) print x,A[x]}’ file

Now, ideally what the output should have been is as follows (ignore the fact that associative arrays may output in random):

161065730303122012098 987678
161065846403042011098 987912
161065916903012012075 987654
161066813503042011075 987322
161066835008092012075 987323
161067040701122012075 987324
161067106602122010074 987325

But, the output I got was as follows:

161066835008092012544 987323
161065846403042017280 987912
161067040701122019328 987324
161067106602122018816 987325
161066813503041994752 987322
161065916903012007936 987654
161065730303122014208 987678

Screenshot

If I remove the +0 from above awk one-liner, the output seems to be what I expect. What I would like to know is why would it corrupt the keys?

The above test was done on:

$ awk -version
awk version 20070501
mklement0
  • 382,024
  • 64
  • 607
  • 775
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • 1
    21-digit numbers; that's pushing the limits unless `awk` has infinite-precision (or, indefinite-precision) arithmetic, which I don't think it does. 18 digits is about the limit for 64-bit integers; 15 decimal digits is about the limit for 64-bit floating point numbers. I'd guess that some of the problem is related to this. Avoid converting the 21-digit string into a number. – Jonathan Leffler Feb 18 '14 at 04:58
  • Hmm, thanks @JonathanLeffler, but for `161067106602122010074` adding `0` made it `161067106602122018816`. Shouldn’t it be lowering the value down instead of increasing it? – jaypal singh Feb 18 '14 at 05:10
  • 1
    When you start printing numbers beyond the precision, anything is possible. It was the conversion to a number that caused the change. – Jonathan Leffler Feb 18 '14 at 05:45

1 Answers1

0

It appears that AWK has some numerical limitations - I get even weirder results on gawk - perhaps the discussion in this SO will help you.

Community
  • 1
  • 1
rfernandes
  • 1,121
  • 7
  • 9