1

I'm writing a Python application that needs to fetch a Google document from Google Drive as markdown.

I'm looking for ideas for the design and existing open-source code.

As far as I know, Google doesn't provide export as markdown. I suppose this means I would have to figure out, which of the available download/export formats is the best for converting to markdown.

The contents of the document is ensured to not contain anything that markdown doesn't support.

EDIT: I would like to avoid non python software to keep the setup as simple as possible.

Rubinous
  • 464
  • 6
  • 12

2 Answers2

2

You might want to take a look at Pandoc which supports conversions i.e. from docx to markdown. There are several Python wrappers for Pandoc, such as pypandoc.

After fetching a document from Google Drive in docx format, the conversion is as simple as:

import pypandoc
markdown_output = pypandoc.convert_file('Document.docx', 'markdown')
vaiski
  • 547
  • 7
  • 12
  • Edited the question a little, rendering Pandoc a bad choice as it's not pure Python. I'm thinking of using https://github.com/mwilliamson/python-mammoth instead. – Rubinous Jul 19 '16 at 08:36
1

Google Drive offers a "Zipped HTML" export option.

enter image description here

Use the Python module html2text to convert the HTML into Markdown.

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby,
<em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.
Li-aung Yip
  • 12,320
  • 5
  • 34
  • 49