0

I am hoping to write a script that will go through a directory and check if the html files are badly formed. Please see my code

directory = "html"
for root, dirs, files in os.walk(directory):
    for file in files:
        if str(file).endswith('.html'):
              #Help needed here
              if file is badly formed:
                 print "Badly Formed"
              else:
                 print "Well Formed"
halfer
  • 19,824
  • 17
  • 99
  • 186
Ruth
  • 5,646
  • 12
  • 38
  • 45

1 Answers1

1
import xml.etree.ElementTree as ETree
....

    try:
        self.doc = ETree.parse( file )
        # do stuff with it ...
    except  ETree.ParseError :
        print( "ERROR in {0} : {1}".format( ETree.ParseError.filename, ETree.ParseError.msg ) )
corn3lius
  • 4,857
  • 2
  • 31
  • 36
  • Note that this only works if the HTML is also valid XML which is not necessarily the case. For a simple example, include any javascript scripts that have quotes or less than signs. – Steven W. Klassen Feb 21 '20 at 18:33