1

I'm trying to convert a XML file to HTML using python. We have the .css file that contains the codes for the format of the output. We have been trying to run the following code:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    headline=[]
    text = infile.readline()
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    while text!="":
        #print(text)
        text = infile.readline()
        text = text.replace("<w>", "")

        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            while text!="</head>":
                headline.append(text)
                print("<h3>headline<\h3>\n",file=outfile)       


main()

but we don't know how to make Python read "text" and "headline" as our variables (changing with every time the loop is executed) instead of a pure string. Do you have any idea? Thank you very much.

Nikaido
  • 4,443
  • 5
  • 30
  • 47
kokazaki
  • 11
  • 1
  • You should read about templating in Python. e.g. Jinja2 would probably make your life much easier: http://jinja.pocoo.org/docs/dev/ – Assaf Lavie Mar 26 '16 at 21:28
  • I think this is one of those situations where you *might* get an answer to your question and it *might* solve your issue, but if you took a different approach the issue probably wouldn't even arise in the first place. One could point out that you're not closing your files after reading/writing, that maybe you should use `with open(filename, "r") as f: for line in f: ...` rather that `open()` and `readline`, that you can add the contents of `headline` to an `h3` element by writing `"

    {}

    \n".format(" ".join(headline))`, etc. But really, why not just use an actual XML parsing module?
    – jDo Mar 26 '16 at 21:34

2 Answers2

1

You seem already to have worked out how to output a variable along with some string literals:

print("<h1>",text,"</h1>\n",file=outfile)

or alternatively

print("<h1>{content}</h1>\n".format(content=text), file=outfile)

or just

print("<h1>" + text + "</h1>\n", file=outfile)

The problem is more with how your loop reads in the headline - you need something like a flag variable (in_headline) to keep track of whether we are currently parsing text that is inside a <head> tag or not.

def main():
    with open("WTExcerpt.xml", "r", encoding="utf8") as infile, open("DemoWT.html", "w") as outfile:
        print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
        print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)
        in_headline = False          
        headline = ""
        for line in infile:
            text = line.replace("<w>", "")
            if "<title>" in text and "</title>" in text:
                print("<h1>",text,"</h1>\n",file=outfile)
            elif text=="<head>":
                in_headline = True
                headline = ""
            elif text == "</head>":
                in_headline = False
                print("<h3>", headline, "</h3>\n", file=outfile)
            elif in_headline:
                headline += text

However, it is advisable to use an xml parser instead of, effectively, writing your own. This quickly becomes a complicated exercise - for example this code will break if <title>s are ever split across multiple lines, or if anything else is ever on the same line as the <head> tag.

Community
  • 1
  • 1
Stuart
  • 9,597
  • 1
  • 21
  • 30
0

couple issues I see:

1.instead of initially creating headline as an empty list, why not just set it to be assigned in the loop? 2.your 'while' loop will never complete. Instead of using a while loop, you should use a for loop like so:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    for line in infile:
        text = line.replace("<w>", "")
        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            in_headline = True
            headline = ""
        elif text == "</head>":
            in_headline = False
            print("<h3>", headline, "</h3>\n", file=outfile)
        elif in_headline:
            headline += text
main()

You should iterate over the file object instead of using a while loop - for 1 because the way you structured the while loop it would never end, and for 2 because it's exponentially more "Pythonic" :).

n1c9
  • 2,662
  • 3
  • 32
  • 52