I have some Ruby code to auto generate tables of contents in GitHub Flavoured Markdown. It would be good to understand other flavours of Markdown too if there are differences relevant to this problem.
At the moment, I have this code that works 99% of the time:
def header_to_anchor
@header
.downcase
.gsub(/[^a-z\d\- ]+/, "")
.gsub(/ /, "-")
end
This is based on a note I found in a GitHub comment here. It reads:
The code that creates the anchors is here: https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/toc_filter.rb
- It downcases the string
- remove anything that is not a letter, number, space or hyphen (see the source for how Unicode is handled)
- changes any space to a hyphen.
- If that is not unique, add "-1", "-2", "-3",... to make it unique
For my purposes, I don't need to solve the uniqueness problem.
This was great until I found another edge case that it failed on, namely, I have a heading in a markdown doc that is:
### shunit2/_shared.sh
And my code generates an anchor that is:
* [shunit2/_shared.sh](#shunit2sharedsh)
And creates another broken link, at least as far as GitHub Flavoured Markdown is concerned.
I've also seen this answer here, but those rules specified there appear to be also not quite robust.
Does anyone know of authoritative documentation that explains the rules for generating these anchors?