So far, I haven't found (or overlooked) comprehensive examples to plot a histogram out of a flattened lists with multiple (hierarchical) columns and unordered entries. So, this might be a starting point for further adaptions and optimizations.
Of course, alternatively, the data could be modified with whatever external tools (or gnuplot itself) to match gnuplot's required input format for histogram styles. Check help histograms
.
Some comments about the code:
- the keywords in the input data don't have to be ordered.
- creates lists of unique keywords for column 1 and column 2.
The order in the unique list (and consequently in the graph) will be in the order of first occurrence.
- converts the keywords into numbers, i.e. indices in the corresponding keyword lists
- the input data may contain multiple entries of the same key and sub-key, e.g.
foo a 2
, foo a 4
, foo a 1
. Since the option smooth freq
is used they will be summed up to foo a 7
.
Code:
Requires gnuplot 5.2.0, because of the use of keyentry
. Code could be adapted for older versions.
### different ordered histograms
reset session
$Data <<EOD
# type version count
bar a 3
baz b 2
bar c 1
baz a 0
foo a 1
foo b 2
foo c 3
bar b 2
baz c 2
EOD
# create a unique list of strings from a column
addToList(list,col) = list.( strstrt(list,'"'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
set table $Dummy
keysA = keysB = ''
plot $Data u (keysA=addToList(keysA,1), keysB=addToList(keysB,2), '') w table
unset table
getIndex(keys,key) = (_idx=NaN, sum [_i=1:words(keys)] (word(keys,_i) eq key ? _idx=_i : 0), _idx )
myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN
myFilter2(colD,colF1,valF1,colF2,valF2) = strcol(colF1) eq valF1 && strcol(colF2) eq valF2 ? column(colD) : NaN
set style fill solid 0.3
set boxwidth 0.8
set offsets 1,1,1,0
set datafile missing NaN
set key top center noautotitle
set grid x,y
gap = 1
set multiplot layout 2,2
plot for [key in keysA] i=getIndex(keysA,key) $Data u (i):(myFilter(3,1,key)):xtic(key) \
smooth freq w boxes lc i
plot for [key in keysB] i=getIndex(keysB,key) $Data u (i):(myFilter(3,2,key)):xtic(key) \
smooth freq w boxes lc i
plot for [keyA in keysA] for [keyB in keysB] tmp=(i=getIndex(keysA,keyA), j=getIndex(keysB,keyB)) \
$Data u ((i-1)*(words(keysB)+gap) + j):(myFilter2(3,1,keyA,2,keyB)): \
xtic(keyB) smooth freq w boxes lc i, \
for [keyA in keysA] keyentry w boxes lc getIndex(keysA,keyA) ti keyA
plot for [keyB in keysB] for [keyA in keysA] tmp=(i=getIndex(keysB,keyB), j=getIndex(keysA,keyA)) \
$Data u ((i-1)*(words(keysA)+gap)+ j):(myFilter2(3,1,keyA,2,keyB)): \
xtic(keyA) smooth freq w boxes lc i, \
for [keyB in keysB] keyentry w boxes lc getIndex(keysB,keyB) ti keyB
unset multiplot
### end of code
Result:
