goal here is to be able to tag the words as I write and having some
computer help to make the listing at the end and render a neat HTML
page with all the links
The hxindex
utility from the W3C html-xml-utils
package does a decent job of creating a back-of-the-book index.
Here's an example using pandoc
to convert
Markdown
to HTML, and hxindex
to produce the HTML
index -- shown here in standard form[1] with links (locators)
underlined and links in bold pointing to the defining term instance.
[1]
The index is a list of lists (<ul class="indexlist"><li>…<ul><li>…
)
which can be styled with CSS.
In Markdown files HTML markup for the index looks like this,
!!
separates term levels and |
multiple terms:
dolor
: <dfn title="dolor">doloremque</dfn>
… yyad
: <span class="index" title="dolor!!yad!!yyad">dolor</span>
ipso, -um
: <dfn title="ipso!!-um">ipsum</dfn>
nihil
a.o. : <span class="index" title="nihil|nulla!!see: nihil|nada!!see: nihil">nihil</span>
Note that
- the
title
attribute is used (abused) in Markdown files, for the
HTML output hxindex
replaces it with an id
attribute
- the
see also: nulla
reference is defined at the target to get the
link right (this is the only way to do it directly with hxindex
so doing it in script instead could be tempting)
- there is no limit on the number of index subterm levels
- the
hxindex
man page lists several options not used
in this example
- the
hxref
utility generates cross-references inside and
between HTML files
Following is the makefile and Markdown source for the example,
plus the generated index database file.
File: Makefile
# desc:
# Use `hxindex` to build an HTML index for Markdown files.
# compat:
# GNU make 4.3 pandoc 2.9.2 html-xml-utils 7.7
# ref:
# https://en.wikipedia.org/wiki/Index_(publishing)
# https://www.w3.org/Tools/HTML-XML-utils/man1/hxindex
SHELL := /bin/sh
.NOTPARALLEL : ## must access $(indextsv) database serially
.DELETE_ON_ERROR :
cssfile := doc.css
indextsv := doc-x.tsv
pandocMeta ?= -M lang='la' -M pagetitle='Topics'
pandocFlags ?= --standalone -w html --css='$(cssfile)' $(pandocMeta)
hxindexFlags ?= -x -f -N
hxnormalizeFlags ?= -x -d
# $(call md2html[,out=$@[,in=$<[,pandocExtraFlags[,hxindexExtraFlags]]]])
define md2html =
pandoc $(pandocFlags)$(if $3, $3) -- $(if $2,$2,$<) \
| hxindex $(hxindexFlags)$(if $4, $4) \
| hxnormalize $(hxnormalizeFlags) \
| hxremove 'head>meta[name],head>style' \
> $(if $1,$1,$@)
endef
#:Single HTML file -- $(indextsv) not required
doc-a.html : $(patsubst %,doc-%.md,1 2 3 x) | $(cssfile) ; \
$(call md2html,$@,$^,,)
#:Multiple HTML files -- index in doc-x.html
doc-x.html : $(indextsv) $(patsubst %,doc-%.html,1 2 3)
doc-%.html : doc-%.md $(indextsv) | $(cssfile); \
$(call md2html,$@,$<,,-b $@ -i $(indextsv))
# Truncate to size zero
$(indextsv) : ; : > $@
# Minimal CSS
$(cssfile) : ; printf '%s\n' > $@ \
'body{color:#111; background-color:#fffff8; margin:4em; font-family:serif;}' \
'dfn{font-weight:bold; font-variant:small-caps;}' \
'span[class~="index"]{text-decoration:underline;}'
#:Delete generated files
clean : ; rm -f -- doc-?.html $(indextsv) $(cssfile)
.PHONY : clean
File: doc-1.md
## Section 1
### Topic A
Sed ut <dfn title="perspicio">perspiciatis</dfn>, unde omnis iste
natus error sit
### Topic B
voluptatem accusantium <dfn title="dolor">doloremque</dfn>
laudantium, totam rem aperiam
### Topic C
eaque <span class="index" title="ipso!!-a">ipsa</span>, quae ab
illo <dfn id="inven…tor" title="inventor">inventore</dfn> veritatis
et quasi architecto beatae vitae dicta sunt, explicabo.
File: doc-2.md
## Section 2
### Topic D
Nemo enim <span class="index" title="ipso!!-am">ipsam</span> voluptatem,
quia voluptas sit, aspernatur aut odit aut
fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem
### Topic E
sequi nesciunt, neque porro quisquam est, qui
<span class="index" title="dolor!!yad">dolorem</span>
<dfn title="ipso!!-um">ipsum</dfn>, quia
<span class="index" title="dolor!!yad!!yyad">dolor</span>
sit amet consectetur adipisci velit, sed quia non numquam eius modi
### Topic F
tempora incidunt, ut labore et
<span class="index" title="dolor!!yad">dolore</span>
magnam aliquam quaerat voluptatem.
File: doc-3.md
## Section 3
### Topic G
Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis
suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur?
### Topic H
Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse,
quam <span class="index" title="nihil|nulla!!see: nihil|nada!!see: nihil"
>nihil</span> molestiae consequatur,
### Topic I
vel illum, qui <span class="index" title="dolor!!yad">dolorem</span>
eum fugiat, quo voluptas <span class="index" title="nihil!!see also: nulla">nulla</span> pariatur?
File: doc-x.md
(index placeholder)
## Index
<!--index-->
File: doc-x.tsv
(the hxindex
database as output by expand -t 26,31,58
)
dolor!!yad 1 doc-2.html#dolore # Topic F Topics
dolor 2 doc-1.html#doloremque # Topic B Topics
dolor!!yad!!yyad 1 doc-2.html#dolor # Topic E Topics
dolor!!yad 1 doc-3.html#dolorem # Topic I Topics
dolor!!yad 1 doc-2.html#dolorem # Topic E Topics
ipso!!-a 1 doc-1.html#ipsa # Topic C Topics
ipso!!-um 2 doc-2.html#ipsum # Topic E Topics
nada!!see: nihil 1 doc-3.html#nihil # Topic H Topics
nihil!!see also: nulla 1 doc-3.html#nulla # Topic I Topics
perspicio 2 doc-1.html#perspiciatis # Topic A Topics
nulla!!see: nihil 1 doc-3.html#nihil # Topic H Topics
nihil 1 doc-3.html#nihil # Topic H Topics
ipso!!-am 1 doc-2.html#ipsam # Topic D Topics
inventor 2 doc-1.html#inven…tor # Topic C Topics
(The glitch in the last line is caused by the 3-byte Unicode ellipsis
character U+2026 being counted as 3 characters.)