43

Anyone know how can I calculate the mean of one these columns (on linux)??

sda               2.91    20.44    6.13    2.95   217.53   186.67    44.55     0.84   92.97
sda               0.00     0.00    2.00    0.00    80.00     0.00    40.00     0.22  110.00 
sda               0.00     0.00    2.00    0.00   144.00     0.00    72.00     0.71  100.00 
sda               0.00    64.00    0.00    1.00     0.00     8.00     8.00     2.63   10.00
sda               0.00     1.84    0.31    1.38    22.09   104.29    74.91     3.39 2291.82 
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00  

For example: mean(column 2)

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Alucard
  • 16,628
  • 7
  • 24
  • 23

5 Answers5

102

Awk:

awk '{ total += $2 } END { print total/NR }' yourFile.whatever

Read as:

  • For each line, add column 2 to a variable 'total'.
  • At the end of the file, print 'total' divided by the number of records.
porges
  • 30,133
  • 4
  • 83
  • 114
  • 1
    @Porges: How to access specific intervals: Lets say in the second column, I want to find mean of elements 2 to 6? – SKPS Sep 26 '16 at 17:21
  • 3
    @SathishKrishnan this is a bit late, but for anyone else: you would prefix the first part with `NR==2,NR==6 { total += .....` (see: https://www.gnu.org/software/gawk/manual/html_node/Ranges.html) – porges Feb 10 '17 at 21:36
4

Perl solution:

perl -lane '$total += $F[1]; END{print $total/$.}' file

-a autosplits the line into the @F array, which is indexed starting at 0
$. is the line number

If your fields are separated by commas instead of whitespace:

perl -F, -lane '$total += $F[1]; END{print $total/$.}' file

To print mean values of all columns, assign totals to array @t:

perl -lane 'for $c (0..$#F){$t[$c] += $F[$c]}; END{for $c (0..$#t){print $t[$c]/$.}}' 

output:

0
0.485
14.38
1.74
0.888333333333333
77.27
49.8266666666667
39.91
1.29833333333333
434.131666666667
Chris Koknat
  • 3,305
  • 2
  • 29
  • 30
1

You can use python for that, is available in Linux.

If that comes from a file, take a look at this question, just use float instead.

For instance:

#mean.py 
def main():
    with open("mean.txt", 'r') as f:
        data = [map(float, line.split()) for line in f]

    columnTwo = []
    for row in data:
        columnTwo.append( row[1] )

    print  sum(columnTwo,0.0) / len( columnTwo )



if __name__=="__main__":
    main()

Prints 14.38

I just include the data in the mean.txt file, not the row header: "sda"

Community
  • 1
  • 1
OscarRyz
  • 196,001
  • 113
  • 385
  • 569
  • 1
    My first thought would probably have been Python as well... but making the list might be overly inefficient here, since you only really need the sum and the number of lines. (Also, for the fun of it: `with open("mean.txt", 'r') as f: n,t = map(sum, zip(*((1, float(line.split()[1])) for line in f))); print t/n`) – David Z Jun 26 '10 at 02:43
0

Simple-r will calculate the mean with the following line:

r -k2 mean file.txt

for the second column. It can also do much more sophisticated statistical analysis, since it uses R environment for all of its statistical analysis.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Tom
  • 41
  • 1
0

David Zaslavsky for the fun of it:

with open("mean.txt", 'r') as f: 
    n,t = map(sum, zip(*((1, float(line.split()[1])) for line in f)))
print t/n
Community
  • 1
  • 1
OscarRyz
  • 196,001
  • 113
  • 385
  • 569