I have csv file, I want to create histogram from column 6. Using Linux utilities this is simple:
└──> cut -f6 -d, data.csv | sort | uniq -c | sort -k2,2n
563 0.0
72 0.025
35 0.05
22 0.075
14 0.1
21 0.125
14 0.15
10 0.175
5 0.2
3 0.225
7 0.25
3 0.275
6 0.3
5 0.325
3 0.35
1 0.375
3 0.4
1 0.425
3 0.45
3 0.475
5 0.5
7 0.525
11 0.55
3 0.575
4 0.6
3 0.625
11 0.65
5 0.675
9 0.7
5 0.725
7 0.75
8 0.775
5 0.8
3 0.825
3 0.85
4 0.875
2 0.9
1 0.925
1 0.975
109 1.0
But I would like to plot it using gnuplot
my attempt was to modify following script that I've found. This is my modified version:
#!/usr/bin/gnuplot -p
# http://psy.swansea.ac.uk/staff/carter/gnuplot/gnuplot_frequency.htm
clear
reset
set datafile separator ",";
# set term dumb
set key off
set border 3
# Add a vertical dotted line at x=0 to show centre (mean) of distribution.
set yzeroaxis
# Each bar is half the (visual) width of its x-range.
set boxwidth 0.05 absolute
set style fill solid 1.0 noborder
bin_width = 0.1;
bin_number(x) = floor(x/bin_width)
rounded(x) = bin_width * ( bin_number(x) + 0.5 )
# MAKE BINS
# plot dataset_path using (rounded($6)):(6) smooth frequency with boxes
# DO NOT MAKE BINS
plot "data.csv" using 6:6 smooth frequency with boxes
This is the result:
It is saying something completely different than Unix tools. In gnuplot
I've seen various types of histograms, e.g. some follows normal distribution pattern, others were ordered according to frequency (as if I replace the last sort -k2,2n
with sort -n
) another were ordered according to numbers from which histogram was created (mine case), etc. it would be nice if I could choose.