1

I want to convert java specification documentation to easily editable formats(markdown or asciidoc) and upload GitHub Gist and customize(adding my code experiences and notes.) I want to convert to something like this

I use a tool called pandoc that allows us to convert from HTML to markdown.

I tried followings:

Technique 1 I tried to convert all table of components of java specification on index.html

pandoc -f html -t markdown -o test2.md  
https://docs.orac le.com/javase/specs/jls/se10/html/index.html`

I got this:tes2.md (I did not upload here because the file of contents is too long)

Problem 1: This markdown file does not have contents of java specification documentation. I expected that I got markdown toc(table of components) and java specification documentation contents in markdown file like this`

Problem 2: When click the links on this markdown file then I get 404 error page.

Technique 2(Better than technique 1) I downloaded all HTML files of TOC with HTTrack and try to convert all files separately.

pandoc -f html-native_divs-native_spans -i jls-1.html -t markdown -o test2.md  

Problem 1: I got following markdown file which have the table of components links that cannot redirect to another section of the same document. When I click on this links, they return external GitHub page like that:https://gist.github.com/lostdinar2/jls-1.html#jls-1.1 which is not available. test3.md

A demonstration of problem 1:

1)I want to convert this HTML internal id link(#) to the markdown internal link that redirects to another section of the same document

<dt><span class="section"><a href="jls-2.html#jls-2.2">2.2. The Lexical Grammar</a></span></dt>

[link text](#abcd)

2)But pandoc cannot convert this links to the markdown internal link.Pandoc create an external link like this:https://gist.github.com/lostdinar2/jls-1.html#jls-1.1

Is there a pandoc parameter to fix this? I make a search on pandoc documentation but I cannot do this feature.

my-lord
  • 2,453
  • 3
  • 12
  • 26
  • you'd have to write a script to download all the HTML pages you want. pandoc doesn't follow links. – mb21 May 07 '18 at 13:02
  • Sure, I downloaded all HTML files, what can I do after downloading HTML files? – my-lord May 07 '18 at 13:43
  • `cd folder` and `pandoc -f html-native_divs-native_spans -t markdown *.html -o output.md` – mb21 May 07 '18 at 14:29
  • Thanks. I tried this but I got following output: pandoc: *.html: openFile: invalid argument (Invalid argument) .Is there a way to fix this? Note that I am trying to execute this command in a directory which has 20 of HTML files. – my-lord May 07 '18 at 19:57
  • the star only works on mac and linux. are you on windows? https://stackoverflow.com/questions/27852067/batch-convert-files-with-pandoc-in-windows – mb21 May 07 '18 at 21:28
  • yes, How can I do this on windows? And `html-native_divs-native_spans` parameters cannot fix` technique 2-problem 1.I need to a pandoc parameter that converts HTML internal id link(#) to the markdown internal link. – my-lord May 07 '18 at 21:35

0 Answers0