How do I convert HTML code to Confluence-style Wiki Markup?

Question

The API documentation for Mylyn Wikitext has functions to convert Wiki Markup to HTML, but I cannot find functions to convert / parse HTML code to Wiki Markup. Class MarkupParser has method parseToHTML, but where can I find the reverse?

Graham Hannington · Answer 1 · 2012-07-16T05:22:29.313

11

Try Wikifier.

It doesn't do exactly what you want, but you might find it does enough, or is a useful starting point.

Wikifier converts snippets of the Confluence 4 XML storage format (that is, as presented by the Confluence Source Editor plugin, without a single document root element) into Confluence 3 wiki markup.

Why is this at all relevant to your question? The Confluence 4 XML storage format includes some elements and attributes that have the same names as XHTML elements and attributes.

For more information, click the Help link on the Wikifier web page.

Note: The XSLT stylesheet used by the Wikifier web page is slightly more recent than the XSLT stylesheet bundled with the related schema package.

This added later: Wikifier RT is even closer to what you want.

edited Jul 16 '12 at 05:22

answered Jun 25 '12 at 14:46

Graham Hannington

1,749
16
18

3

@Christian Koch: the Confluence 4 storage format is not XHTML. Rather, the Confluence 4 XML storage format includes elements that have the same names as some descendants of the XHTML body element (and some of their attributes). Atlassian used to refer to the Confluence 4 storage format as XHTML, but have lately (with some prompting) been referring to it as "XHTML-based". To the question "What subset of XHTML does Confluence support?" (2012-04-12), Atlassian responded "this page is already comprehensive - Any tags you don't already see documented on this page will likely get removed" – Graham Hannington Jun 26 '12 at 03:27

JoshDM · Answer 2 · 2014-07-21T16:56:15.273

Here is how you do it in Mylyn using the WikiText Standalone. Substitute the appropriate DocumentBuilder for your desired Wiki markup (you'll have to check the API to see what's available; TextileDocumentBuilder also exists).

File ConvertToConfluence.java:

package com.stackoverflow.mylyn;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.StringWriter;

import org.eclipse.mylyn.internal.wikitext.confluence.core.ConfluenceDocumentBuilder;
import org.eclipse.mylyn.wikitext.core.parser.HtmlParser;
import org.xml.sax.InputSource;

public class ConvertToConfluence {

    public static String convertHTML(File htmlFile) {

        InputStream in = null;

        try {

            in = new FileInputStream(htmlFile);

        } catch (Exception ex) {

            // TODO: handle or re-throw file exception
        }

        InputSource inputSource = new InputSource(new InputStreamReader(in));
        StringWriter writer = new StringWriter();
        ConfluenceDocumentBuilder builder = new ConfluenceDocumentBuilder(writer);
        HtmlParser parser = new HtmlParser();

        try {

            parser.parse(inputSource, builder);

        } catch (Exception ex) {

            // TODO: handle or re-throw parsing exception
        }

        return writer.toString();       
    }   

    public static void main(String args[]) {

        File file = new File("c:\\filename.html");
        System.out.println(convertHTML(file));
    }
}

File filename.html:

<HTML>
<BODY>
<p>This is <b>bold text</b> and some <i>italic text</i>.<br/><br/>TEST!</p>
</BODY>
</HTML>

Produces Confluence output:

This is *bold text* and some _italic text_.
\\TEST!

I downloaded the jar via Maven but could not locate HTMLParser file. Could you please help — Tarun, Apr 08 '16 at 13:47

score 4 · Answer 3 · edited Oct 26 '17 at 05:48

I was able to achieve HTML to Confluence-style WikiMarkup using the DefaultWysiwygConverter from Atlassian's own Java libraries. Here's a simplified unit test:

import com.atlassian.renderer.wysiwyg.converter.DefaultWysiwygConverter;

String htmlString = "This is <em>emphasized</em> and <b>bold</b>";
DefaultWysiwygConverter converter = new DefaultWysiwygConverter();
String wikiMarkupString = converter.convertXHtmlToWikiMarkup(htmlString);
Assert.assertEquals("This is _emphasized_ and *bold*", wikiMarkupString);

The POM must include the correct repositories and dependencies

    <dependency>
        <groupId>com.atlassian.renderer</groupId>
        <artifactId>atlassian-renderer</artifactId>
        <version>8.0.5</version>
        <exclusions>
            <exclusion>
                <!-- This exclusion is necessary if you are in a situation which 
                     it conflicts, EG: using spring-boot -->
                <groupId>javax.servlet</groupId>
                <artifactId>servlet-api</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <repositories>
        <repository>
            <!-- https://developer.atlassian.com/docs/advanced-topics/working-with-maven/atlassian-maven-repositories -->
            <id>atlassian-public</id>
            <url>https://packages.atlassian.com/maven/repository/public</url>
            <snapshots>
                <enabled>true</enabled>
                <updatePolicy>never</updatePolicy>
                <checksumPolicy>warn</checksumPolicy>
            </snapshots>
            <releases>
                <enabled>true</enabled>
                <checksumPolicy>warn</checksumPolicy>
            </releases>
        </repository>
    </repositories>

I think this is the best option. It's directly using Atlassian's own parser. Some people, [on the atlassian forums](https://community.atlassian.com/t5/Jira-questions/How-to-convert-html-to-Atlassian-markup-in-java-app/qaq-p/657726), are having trouble getting the dependencies correct, so I'm providing this example with a working POM on [github](https://github.com/paul-nelson-baker/html-to-jira-markup). — Niko, Oct 26 '17 at 02:00

score -3 · Answer 4 · answered May 08 '12 at 06:01

-3

As far as I know there is no way to convert HTML to Confluence wiki markup. And since Atlassian stops using textile as wiki markup in Confluence 4.x there is no need for a conversion. The page format ist XHTML.

answered May 08 '12 at 06:01

Christian Koch

701
7
16

3

@Graham Hannington has disproven this answer in the comments below his answer, and both he and I have posted ways to convert HTML to Confluence wiki markup. – JoshDM May 16 '13 at 19:31

How do I convert HTML code to Confluence-style Wiki Markup?

4 Answers4

Linked