1

I'm trying to work out the standard deviation for a set of students marks in different subjects. I'm just a bit stuck on the last calculation I need to do and I'm just not sure what the issue is.

BEGIN {
    i=0
    printf("\nResults for form 6B\n")
    }         
$1=="SUBJECT" {
        i++
        subject[i]=$2
    total[i]=0
    count[i]=0
    printf("\nLits of %s Students\n",subject[i])
    printf("Name         Mark    Pass/Fail\n")
    printf("----         ----    ---------\n")
    }
NF>2 {  mark[i] = ($3+$4)/2 
    student=$2" "$1
    total[i] = total[i]+mark[i]
    count[i] = count[i]+1
    if (mark[i]>49)
        result="Pass"
        else
        result="Fail"
    printf("%-14s%-3d%10s \n",student, mark[i], result)
    }
END {  top = i
        printf("\nSubject        Mean     Standard Deviation\n") 
        printf("-------        ----     ------------------\n")
    var=0
    for(i=1;i<=top;i++){
        mean[i]=total[i] / count[i]

        var+=((mark[i]-mean[i])^2) #Standard deviation not working#
        stdev=sqrt(var/count[i])

        printf("%16-s%-3d%12d \n",subject[i],mean[i],stdev) 
        }
    }

Forgot to add input file "marks"

FORM    6B
SUBJECT Maths  
Smith   John    40  50 
Evans   Mike    50  80 
SUBJECT Physics
Jones   Tom 35  65
Evans   Mike    46  76
Smith   John    34  56
SUBJECT Chemistry
Jones   Tom 50  60
Evans   Mike    30  40

Output I'm getting is Maths 7 Physics 7 Chemistry 11

The correct values are 10 6 10

Taryn
  • 242,637
  • 56
  • 362
  • 405
  • FORM 6B SUBJECT Maths Smith John 40 50 Evans Mike 50 80 SUBJECT Physics Jones Tom 35 65 Evans Mike 46 76 Smith John 34 56 SUBJECT Chemistry Jones Tom 50 60 Evans Mike 30 40 – Rhys Howells Jan 07 '15 at 18:41
  • 1
    Add the input to your question, don't use a comment. – Tom Fenech Jan 07 '15 at 18:45
  • yea quickly realized that the comment section wasn't very good for the input file. I did the suggestion you made, however the results aren't right and I'm just getting the same as with ^2 – Rhys Howells Jan 07 '15 at 18:48
  • The correct values are 14.142316, 8.185353 and 14.142316 again, not 10, 6, 10. For math: mean = (45 + 65) / 2 = 55. stddev = sqrt(((45 - 55)^2 + (65 - 55)^2) / (2 - 1)) = sqrt(200) = 14.142316. A correct program will not generate the results you want. – Wintermute Jan 07 '15 at 19:21
  • @Wintermute if you look at OPs other question, they take averages of the two scores first, so they are the correct means, also i already answered this in the other question so i don't know what they want. –  Jan 08 '15 at 00:42
  • To me, it looks as though he calculated sqrt(squares / n) instead of sqrt(squares / (n - 1)) and rounded down, but that is guesswork. – Wintermute Jan 08 '15 at 08:30
  • @Wintermute Why would you do `(n-1)` ? –  Jan 08 '15 at 09:03
  • @RhysHowells What was wrong with my answer in the other question ? –  Jan 08 '15 at 09:07
  • @Jidder: Er...because that's in the formula for the sample standard deviation of a sample of size `n`. – Wintermute Jan 08 '15 at 10:23
  • @wintermute thats only if you are using a sample, not a full population. –  Jan 08 '15 at 10:39
  • @Wintermute I check the maths out again, and I messed up, it's the population standard deviation I need to find. My mistake sorry. – Rhys Howells Jan 08 '15 at 17:33
  • This is a duplicate of http://stackoverflow.com/questions/18786073/compute-average-and-standard-deviation-with-awk – tommy.carstensen Apr 13 '15 at 20:50
  • @RhysHowells Please don't place the answer inside of the question, answers go in the answer section. – Taryn Jun 03 '15 at 14:10

1 Answers1

0

Have a look at gawk's printf documentation. The following will illustrate what is happening:

$ awk 'BEGIN { printf "%%d:%d %%i:%i %%f:%f %%s:%s\n", 3.8, 3.8, 3.8, 3.8}'

%d:3 %i:3 %f:3.800000 %s:3.8

So, %i and %d floor the float. You can specify how the number look like in %f with some modifiers.

joepd
  • 4,681
  • 2
  • 26
  • 27