Convert XML document (describing a markdown file) to markdown file

Question

Have been using Ulysses App (a markdown writing app for Mac) for a while and created hundreds of notes. Ulysses App will let you write in markdown, but internally it saves the notes in XML format.

Now I'd like to move my notes out of it, but it does not offer an easy way to do so. (At least in some edge cases where you have embedded images.)

Here is an example note in XML format:

<sheet version="5" app_version="19.2">
<string xml:space="preserve">
<p><tags><tag kind="heading1"># </tag></tags>Ulysses App - Example note</p>
<p></p>
<p><tags><tag kind="heading2">## </tag></tags>This is a header 2</p>
<p></p>
<p>And here is some text under headar 2.</p>
<p></p>
<p><tags><tag kind="heading2">## </tag></tags>Now some more</p>
<p></p>
<p><tags><tag kind="orderedList">1. </tag></tags>Item one</p>
<p><tags><tag kind="orderedList">2. </tag></tags>Item two</p>
<p></p>
<p><tags><tag kind="heading2">## </tag></tags>Finally some code</p>
<p></p>
<p><tags><tag kind="codeblock"></tag></tags><attribute identifier="syntax">python</attribute>for j in range(10):</p>
<p><tags><tag kind="codeblock"></tag></tags>    print(j)</p>
<p></p>
<p></p>
<p></p></string>
</sheet>

It will be rendered as:

# Ulysses App - Example note

## This is a header 2

And here is some text under headar 2.

## Now some more

1. Item one
2. Item two

## Finally some code

```python
for j in range(10):
    print(j)
``

In doing some research about how I can convert these XML files to markdown files I came across XSLT. Did some research about XSLT, but I am overwhelmed because I don't have a clue where to start.

Can someone point me to the right direction? Is trying to do an XML transformation via XSLT the right way to doing the task? Or is it too heavyweight?

I am fluent in Python. Does it make more sense to parse the XML in Python and try to convert it via Python?

All tips are welcome and appreciated.

Transformation rules

The way I see it, the transformation rules are like so:

A document consists of paragraphs. Even a headar is (in a) paragraph
Each paragraph can have one or more "qualifiers".
So a paragraph can contain only plain text like <p>And here is some text under headar 2.</p> or it can have a qualifier before or after the text like here: <p><tags><tag kind="orderedList">1. </tag></tags>Item one</p>.

In a first attempt I can parse the XML in Python to get the paragrapth texts. That would get me the bare bones of my note.

In a second iteration I can focus on parsing the "tags" (or qualifiers).

Will post my results here once done.

Question: Can this "easily" be done in XSLT?

Update: Naive solution in Python

I came up with a solution in Python3 that gets all inner texts of the paragraphs and their children. That pretty closely resembles my markdown file:

#!/usr/bin/env python3
import argparse
import xml.etree.ElementTree as ET



def load_xml(filename):
    tree = ET.parse(filename)
    root = tree.getroot()
    # Traverse second child!
    # First: "markdown", second: "string" (contains the content of the doc)
    content = root[1]
    return content


def main():
    # get filename from command line
    parser = argparse.ArgumentParser()
    parser.add_argument("--file", "-f", type=str, required=True)
    args = parser.parse_args()
    filename = args.file

    # load xml content
    content = load_xml(filename)

    # get all text from all children
    # https://stackoverflow.com/a/34240893/5115219
    para = list(content.itertext())
    # join text list
    text = "".join(para)
    print(text)


if __name__ == "__main__":
    main()

Test run

➜  (venv:ulysses_xml) ulysses_xml ./process_xml.py -f Content.xml

# Ulysses App - Example note

## This is a header 2

And here is some text under headar 2.

## Now some more

1. Item one
2. Item two

## Finally some code

pythonfor j in range(10):
    print(j)

Of course some things are missing. For instance the code block markup at the bottom. And some other fine details. But a first start at least.

If that is your answer, you should post it as an answer. – MatthewMartin Jun 08 '23 at 02:00 — MatthewMartin, Jun 08 '23 at 02:00

Convert XML document (describing a markdown file) to markdown file

Transformation rules

Update: Naive solution in Python

Test run

0 Answers0