0

I have some data that looks like this:

add 0.17411 0.00018 0.17430 0
add 0.03959 0.00014 0.03974 1
add 0.00923 0.00013 0.00935 2
add 0.01346 0.00011 0.01357 3
add 1.00567 0.00015 1.00582 4

How can I get compute some statistics on these numbers? I would like to get things like min, max, avg, stddeviation for each of the columns.

Ideally it would be something like awk-like, and included in standard linux distributions.

prog max(column1),avg(column1) < myfile
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
  • possible duplicate of [command line utility to print statistics of numbers in linux](http://stackoverflow.com/questions/9789806/command-line-utility-to-print-statistics-of-numbers-in-linux) – Barmar Jul 14 '15 at 20:22
  • Also see http://stats.stackexchange.com/questions/24934/command-line-tool-to-calculate-basic-statistics-for-stream-of-values – Barmar Jul 14 '15 at 20:22
  • and http://www.commandlinefu.com/commands/view/1661/display-the-standard-deviation-of-a-column-of-numbers-with-awk – Barmar Jul 14 '15 at 20:23
  • 2
    Google "linux standard deviation" and you'll find lots more. Did you make any attempt to search for this yourself? – Barmar Jul 14 '15 at 20:23

1 Answers1

2

Why don't you use a database:

first, add column names to your file:

sed -i 'i1col0 col1 col2 col3 col4' myfile

Then, create a database and output some stats:

sqlite3 myfile.sqlite <<END
.separator " "
.import myfile mytable
select max(col1), avg(col1) from mytable;
END

Outputs

1.00567 0.248412
glenn jackman
  • 238,783
  • 38
  • 220
  • 352