1

I am having issues understanding why my code counts like it does. I use the following code to calculate the sum of lengths of all lines.

awk '{cnt += length($0)} END { print cnt/NR}' text.txt

In my text file i have the following.

hello

hellohello

There is no space between the sentences in the actual text file.

For example why would get i the value of 16 when i run code below and not 15

awk '{cnt += length($0)} END { print cnt }' text.txt

I understand that the count of 16 is divided by 2 because NR(numbers of lines)in my original count. But why does it count an extra character when i have 15 in the text file? When i edit my text file differently i get different results. If i end on a empty line(hit enter after "hellohello") it also counts that one towards the total count, then i would get 17.

Basically i need someone to help me and explain what exactly its counting and why.

Dusan Biga
  • 53
  • 5
  • The total count of character should be 17 in theory. 15 visible characters and two newlines (`\n`). The newline characters are not counted because they are not record separator and not part of the record `$0`. Having this said, you should get a count of 15. Why do you get 16? There is an extra invisible character which is most likely the carriage return `\r` due to dos-line endings which is counted. You can see this if you do `cat -vET text.txt`, or even just do `file text.txt`. – kvantour May 11 '20 at 08:01
  • Also related: [Remove carriage return in Unix](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it) – kvantour May 11 '20 at 08:06

0 Answers0