linux: how can I perform awk-like statistics on text input?

Question

I have some data that looks like this:

add 0.17411 0.00018 0.17430 0
add 0.03959 0.00014 0.03974 1
add 0.00923 0.00013 0.00935 2
add 0.01346 0.00011 0.01357 3
add 1.00567 0.00015 1.00582 4

How can I get compute some statistics on these numbers? I would like to get things like min, max, avg, stddeviation for each of the columns.

Ideally it would be something like awk-like, and included in standard linux distributions.

prog max(column1),avg(column1) < myfile

possible duplicate of [command line utility to print statistics of numbers in linux](http://stackoverflow.com/questions/9789806/command-line-utility-to-print-statistics-of-numbers-in-linux) — Barmar, Jul 14 '15 at 20:22
Also see http://stats.stackexchange.com/questions/24934/command-line-tool-to-calculate-basic-statistics-for-stream-of-values — Barmar, Jul 14 '15 at 20:22
and http://www.commandlinefu.com/commands/view/1661/display-the-standard-deviation-of-a-column-of-numbers-with-awk — Barmar, Jul 14 '15 at 20:23
Google "linux standard deviation" and you'll find lots more. Did you make any attempt to search for this yourself? — Barmar, Jul 14 '15 at 20:23

score 2 · Accepted Answer · answered Jul 14 '15 at 20:55

2

Why don't you use a database:

first, add column names to your file:

sed -i 'i1col0 col1 col2 col3 col4' myfile

Then, create a database and output some stats:

sqlite3 myfile.sqlite <<END
.separator " "
.import myfile mytable
select max(col1), avg(col1) from mytable;
END

Outputs

1.00567 0.248412

answered Jul 14 '15 at 20:55

glenn jackman

That is quite an innovative idea, and opens up lots of avenues for exploration. Thanks! – Mark Harrison Jul 14 '15 at 23:14
You can even avoid writing an sqlite file with `sqlite3 < – glenn jackman Jul 15 '15 at 15:01

1 Answers1