1

I am working on the following dataset and I would like to create an awk script to make an arithmetic operation between with other columns and add the result of each record in a new column that could be called "Survival Percentage" with only 2 decimals.

The operation would be the following:

((Column 1 - Column 2)/Column 3)*100

Below you can see a sample of the dataset:


40462186,177827,7671,4395,190,4.313743132
2872296,273870,3492,95349,1216,1.275057509
45236699,265691,6874,5873,152,2.587215976
77481,40024,153,516565,1975,0.382270638

The code I have tried to implement is as follows but it doesn't even run and it is a shell script and not an awk script as I wish.

awk  'BEGIN { FS=OFS="," } NR == 1 { $11="new" } NR > 1 { $11=(($1-$2)/$3)*100 }1' dataset.csv

From comments: After eliminating ^M as you told me, I have detected that there are rows in the "population" column that should be numerical and there is a string. Do you have any idea to discard the records that meet this condition using also awk and then perform the operation of my code? Any idea?

oshiono
  • 71
  • 5
  • Your code looks ok to me(very first look without running it), could you please do check if your file have control M characters in it? Do a `cat -v your_file` once and see if you see any control M characters in it, let us know how it goes – RavinderSingh13 May 20 '22 at 23:53
  • Cannot reproduce. Code works for me as-is. – j_b May 20 '22 at 23:57
  • Hi| @RavinderSingh13 I've checked and it does. However, I believe that awk scripts must consist of a BEGIN and an END part in order to be run as gawk -f file.awk dataset.csv, right? – oshiono May 21 '22 at 00:00
  • @oshiono, `END` block is NOT a mandatory block(depending upon code's requirement only one has to use it), so its not because of that for sure. – RavinderSingh13 May 21 '22 at 00:01
  • Please add output of command `file dataset.csv` to your question (no comment). – Cyrus May 21 '22 at 00:01
  • I checked and at the end of the line I get ^M – oshiono May 21 '22 at 00:02
  • Use `dos2unix dataset.csv` to remove carriage returns from your file. – Cyrus May 21 '22 at 00:03
  • @Cyrus If I understand properly what you mean, the output is the second block I posted. – oshiono May 21 '22 at 00:05
  • @oshiono, see this thread and try removing them(control M characters) with any one of the solutions provided there https://stackoverflow.com/questions/19406418/remove-m-characters-from-file-using-sed – RavinderSingh13 May 21 '22 at 00:19
  • @RavinderSingh13 After eliminating ^M as you told me, I have detected that there are rows in the "population" column that should be numerical and there is a string. Do you have any idea to discard the records that meet this condition using also awk and then perform the operation of my code? – oshiono May 21 '22 at 00:52

1 Answers1

3

Considering your samples and shown attempts here is the awk code which you can try. Which removes Control M characters(which we found out in comments) and this also checks that your 6th, 7th and 4th columns should be integers/floats etc and nothing else, try it out once.

awk '
BEGIN  { FS=OFS="," }
{ sub(/\r$/,"") }
FNR==1 { $11="new"  }
FNR > 1 && ($6+0=$6 && $7+0=$7 && $4+0=$4){
  $11=(($6-$7)/$4)*100
}
1' dataset.csv

OR to get values till 2 decimal points try something like(make use of sprintf function to save values to $11 itself:

awk '
BEGIN  { FS=OFS="," }
{ sub(/\r$/,"") }
FNR==1 { $11="new"  }
FNR > 1 && ($6+0=$6 && $7+0=$7 && $4+0=$4){
  $11=sprintf("%0.2f",(($6-$7)/$4)*100)
}
1' dataset.csv
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Thank you!! Finally, how can I modify your code to print only 2 decimals. I have a long way to learn yet :) – oshiono May 21 '22 at 01:18
  • @oshiono, your welcome, np. For 2 decimal points you can use `sprintf` command to save floating points value into `$11` itself. We all are here to learn; keep posting Good questions and try to answer(wherever you feel you can), we all learn from each other cheers. – RavinderSingh13 May 21 '22 at 01:19
  • @oshiono, you can edit your question to mention you need decimal value also and I can make a small edit to my answer then, thank you. – RavinderSingh13 May 21 '22 at 01:20
  • 1
    Perfect!! Done! – oshiono May 21 '22 at 01:22
  • @oshiono, sure, added OR solution now in your question, cheers. – RavinderSingh13 May 21 '22 at 01:23