If you take advantage of two Python packages, pypandoc
and panflute
, you could do it quite pythonically in a few lines (sample code):
Given a text file example.md
, and assuming you have Python 3.3+ and already did pip install pypandoc panflute
, then place the sample code in the same folder and run it from the shell or from e.g. IDLE.
import io
import pypandoc
import panflute
def action(elem, doc):
if isinstance(elem, panflute.Image):
doc.images.append(elem)
elif isinstance(elem, panflute.Link):
doc.links.append(elem)
if __name__ == '__main__':
data = pypandoc.convert_file('example.md', 'json')
doc = panflute.load(io.StringIO(data))
doc.images = []
doc.links = []
doc = panflute.run_filter(action, prepare=prepare, doc=doc)
print("\nList of image URLs:")
for image in doc.images:
print(image.url)
The steps are:
- Use
pypandoc
to obtain a json string that contains the AST of the markdown document
- Load it into
panflute
to create a Doc object (panflute requires a stream so we use StringIO)
- Use the
run_filter
function to iterate over every element, and extract the Image and Link objects.
- Then you can print the urls, alt text, etc.