0

I am trying to convert all the .html files under a directory into Markdown. After some Googling I discovered a Pypi script called html2text.

Then I wrote a code block that can convert one .html into .md at a time.

import html2text as ht
import os
import sys

from pathlib import Path

text_maker = ht.HTML2Text()

with open('myHtmlFilePath.html','r',encoding='UTF-8') as f:
    htmlpage = f.read()

text = text_maker.handle(htmlpage)

with open('myMarkdownFileName.md','w') as f:
    f.write(text)

Is there any possibility that I can wrap this code block in a loop, so that it can convert any file with the filename extension .html into .md under a given directory?

  • 2
    Does [this](https://stackoverflow.com/questions/10377998/how-can-i-iterate-over-files-in-a-given-directory) help? – costaparas Dec 07 '20 at 11:23
  • As a newbie in Python I need to use my noodle to figure out how to integrate your reference into my code. But thanks anyway, this definitely is useful though I haven't figured out how. – ChinaMahjongKing Dec 07 '20 at 11:55

1 Answers1

0

if you use linux you can use find command

linux

import os

dir = "."

for file in os.popen("find " + dir).read().splitlines():
    if file.endswith(".html"):
        print(file)

windows

import os

dir = "."

for i in os.walk(dir):
    for i2 in i[2]:
        if i2.endswith(".html"):
            print(i[0] + "/" + i2)

eyal
  • 107
  • 1
  • 7
  • Thanks so much. I wrote my script in Win10 though. How do I realize your code in Win10? – ChinaMahjongKing Dec 07 '20 at 11:37
  • @eyal best to stick to the more portable (cross-platform) solutions, as suggested in the link I posted above. – costaparas Dec 07 '20 at 11:57
  • Ugh, please don't shell out for this. Python is perfectly capable of iterating over files itself. See [the link provided by costaparas in the comments above](https://stackoverflow.com/q/10377998/354577), for starters. – ChrisGPT was on strike Dec 07 '20 at 12:50