print text with underline in python

Question

I have tried the below code for finding the underlined text in a html file, but it is not working.

f=open("jk.html","r")
while True:
    for line in f.read():
        for i in line.split():
            j=i.find("<ul>")
            k=i.find("</ul>")
            for m in range(j, k):
                print(m)

f.close()

Here is my HTML file:

<html>
<body>
   <ul> hill </ul>
   <p> millfhhf </p>
</body>
</html>

for parsing html content, it is advised to use one of python's xml parse module. — venpa, Mar 22 '14 at 06:30
If you're expecting that `while` loop to break, you're going to be waiting a while. — roippi, Mar 22 '14 at 06:37

score 1 · Answer 1 · answered Mar 22 '14 at 06:38

1

This becomes really simple if you use the BeautifulSoup module, which is going to be far better at parsing HTML (especially if it is messy HTML).

import bs4

f = open("test.html")
soup = bs4.BeautifulSoup(f)

for underlined in soup.find_all('u'):
    print underlined.get_text()

Also, the tag for underlined text in HTML is <u>

<html>
<body>
   <p>
       <u> hill </u>
       <u> millfhhf </u>
   </p>
</body>
</html>

answered Mar 22 '14 at 06:38

mdadm

1,333
1
12
9

Yes, you need to install it. It isn't included with Python by default. What operating system are you using? – mdadm Mar 22 '14 at 06:46
windows 7 operating system – Sumeet ten Doeschate Mar 22 '14 at 06:48
You'll want to install it either using pip or easy_install (obtained with the python setuptools). See this stackoverflow [question](http://stackoverflow.com/questions/12228102/how-to-install-beautiful-soup-4-with-python-2-7-on-windows) for instructions. – mdadm Mar 22 '14 at 06:52
Did that help, were you able to get bs4 installed? – mdadm Mar 24 '14 at 13:56

bereal · Answer 2 · 2014-03-22T06:41:10.527

0

This code does not work because read() returns the rest of the file and then you iterate over it char by char. For lines use readline() or just iterate over the file:

for line in fp:
    # do whatever

That said, use htmlparser or BeautifulSoup or an XML parser for any reliable parsing.

Also, the tag for the underlining is <u>, not <ul>.

edited Mar 22 '14 at 06:41

answered Mar 22 '14 at 06:34

bereal

32,519
6
58
104

2

`read` doesn't return the next character, it returns the whole rest of the file. – Blckknght Mar 22 '14 at 06:35
CAN U TELL ME HOW TO USE HTML PARSER – Sumeet ten Doeschate Mar 22 '14 at 06:36
@SumeettenDoeschate see [@mdadm's answer](http://stackoverflow.com/a/22574349/770830). – bereal Mar 22 '14 at 06:40
@SumeettenDoeschate Don't shout, please. – m.wasowski Mar 22 '14 at 06:47

print text with underline in python

2 Answers2