The chosen answer uses R. Using the same tool, I find a script nicer to work with (than a one-liner) as it can be modified more comfortably to add any specific stats, or format the output differently.
Given this file data.txt
:
1
2
3
4
5
6
7
8
9
10
Having this basic-stats
script in $PATH
:
#!/usr/bin/env Rscript
# Build a numeric vector.
x <- as.numeric(readLines("stdin"))
# Custom basic statistics.
basic_stats <- data.frame(
N = length(x), min = min(x), mean = mean(x), median = median(x), stddev = sd(x),
percentile_95 = quantile(x, c(.95)), percentile_99 = quantile(x, c(.99)),
max = max(x))
# Print output.
print(round(basic_stats, 3), row.names = FALSE, right = FALSE)
Execute basic-stats < data.txt
to print to stdout the following:
N min mean median stddev percentile_95 percentile_99 max
10 1 5.5 5.5 3.028 9.55 9.91 10
The formatting can look a bit nicer by replacing the last 2 lines of the script with the following:
# Print output. Tabular formatting is done by the `column` command.
temp_file <- tempfile("basic_stats_", fileext = ".csv")
write.csv(round(basic_stats, 3), file = temp_file, row.names = FALSE, quote = FALSE)
system(paste("column -s, -t", temp_file))
. <- file.remove(temp_file)
This is the output now, with 2 spaces between columns (instead of 1 space):
N min mean median stddev percentile_95 percentile_99 max
10 1 5.5 5.5 3.028 9.55 9.91 10