Create an awk script to generate a new column in a dataset?

Question

I am working on the following dataset and I would like to create an awk script to make an arithmetic operation between with other columns and add the result of each record in a new column that could be called "Survival Percentage" with only 2 decimals.

The operation would be the following:

((Column 1 - Column 2)/Column 3)*100

Below you can see a sample of the dataset:


40462186,177827,7671,4395,190,4.313743132
2872296,273870,3492,95349,1216,1.275057509
45236699,265691,6874,5873,152,2.587215976
77481,40024,153,516565,1975,0.382270638

The code I have tried to implement is as follows but it doesn't even run and it is a shell script and not an awk script as I wish.

awk  'BEGIN { FS=OFS="," } NR == 1 { $11="new" } NR > 1 { $11=(($1-$2)/$3)*100 }1' dataset.csv

From comments: After eliminating ^M as you told me, I have detected that there are rows in the "population" column that should be numerical and there is a string. Do you have any idea to discard the records that meet this condition using also awk and then perform the operation of my code? Any idea?

Your code looks ok to me(very first look without running it), could you please do check if your file have control M characters in it? Do a `cat -v your_file` once and see if you see any control M characters in it, let us know how it goes — RavinderSingh13, May 20 '22 at 23:53
Hi| @RavinderSingh13 I've checked and it does. However, I believe that awk scripts must consist of a BEGIN and an END part in order to be run as gawk -f file.awk dataset.csv, right? — oshiono, May 21 '22 at 00:00
@oshiono, `END` block is NOT a mandatory block(depending upon code's requirement only one has to use it), so its not because of that for sure. — RavinderSingh13, May 21 '22 at 00:01
Please add output of command `file dataset.csv` to your question (no comment). — Cyrus, May 21 '22 at 00:01
Use `dos2unix dataset.csv` to remove carriage returns from your file. — Cyrus, May 21 '22 at 00:03
@Cyrus If I understand properly what you mean, the output is the second block I posted. — oshiono, May 21 '22 at 00:05
@oshiono, see this thread and try removing them(control M characters) with any one of the solutions provided there https://stackoverflow.com/questions/19406418/remove-m-characters-from-file-using-sed — RavinderSingh13, May 21 '22 at 00:19
@RavinderSingh13 After eliminating ^M as you told me, I have detected that there are rows in the "population" column that should be numerical and there is a string. Do you have any idea to discard the records that meet this condition using also awk and then perform the operation of my code? — oshiono, May 21 '22 at 00:52

RavinderSingh13 · Accepted Answer · 2022-05-21T11:16:13.660

3

Considering your samples and shown attempts here is the awk code which you can try. Which removes Control M characters(which we found out in comments) and this also checks that your 6th, 7th and 4th columns should be integers/floats etc and nothing else, try it out once.

awk '
BEGIN  { FS=OFS="," }
{ sub(/\r$/,"") }
FNR==1 { $11="new"  }
FNR > 1 && ($6+0=$6 && $7+0=$7 && $4+0=$4){
  $11=(($6-$7)/$4)*100
}
1' dataset.csv

OR to get values till 2 decimal points try something like(make use of sprintf function to save values to $11 itself:

awk '
BEGIN  { FS=OFS="," }
{ sub(/\r$/,"") }
FNR==1 { $11="new"  }
FNR > 1 && ($6+0=$6 && $7+0=$7 && $4+0=$4){
  $11=sprintf("%0.2f",(($6-$7)/$4)*100)
}
1' dataset.csv

edited May 21 '22 at 11:16

answered May 21 '22 at 01:04

RavinderSingh13

130,504
14
57
93

Thank you!! Finally, how can I modify your code to print only 2 decimals. I have a long way to learn yet :) – oshiono May 21 '22 at 01:18
@oshiono, your welcome, np. For 2 decimal points you can use `sprintf` command to save floating points value into `$11` itself. We all are here to learn; keep posting Good questions and try to answer(wherever you feel you can), we all learn from each other cheers. – RavinderSingh13 May 21 '22 at 01:19
@oshiono, you can edit your question to mention you need decimal value also and I can make a small edit to my answer then, thank you. – RavinderSingh13 May 21 '22 at 01:20
1

Perfect!! Done! – oshiono May 21 '22 at 01:22
@oshiono, sure, added OR solution now in your question, cheers. – RavinderSingh13 May 21 '22 at 01:23

Create an awk script to generate a new column in a dataset?

1 Answers1

Linked