0

I am using the code below to get the following

<p>Ibn Umar reported: I passed by the Messenger of Allah, peace and blessings be
 upon him, while my garment was trailing. The Prophet said, ÔÇ£<b>O Abdullah, ra
ise your garment</b>.ÔÇØ I lifted it up and he told me to raise it higher and I
did so. Some of the people said, ÔÇ£To where should it be raised?ÔÇØ The Prophet
 said, ÔÇ£<b>In the middle of the shins</b>.ÔÇØ</p>

I am wondering if you would be able to help me get rid of the <p>, </p> and <b>

Code:

url1 = "http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-the-lower-garment-should-be-hallway-between-the-shins/"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1) 
english_hadith = soup.findAll('p')[0]
print english_hadith
Ossama
  • 2,401
  • 7
  • 46
  • 83

3 Answers3

1

You can do this with nltk.

Example:

from nltk import clean_html
html = "..."
clean_html(html)
James Mills
  • 18,669
  • 3
  • 49
  • 62
  • nltk is a best library to remove html tags. – Nilesh Dec 10 '13 at 03:39
  • But Avoid package external python library, If you can use python module like re for this. If there is no option in python, then only you can use external python library – Nilesh Dec 10 '13 at 03:48
  • Sure you could write some hand-written code, or borrow someone else's but there are nicely libraries for this such as nltk that just do the job for you :) (*why re-invent the wheel?*) – James Mills Dec 10 '13 at 03:59
0

I would recommend using regular expressions rather than beautifulsoup.

>>> import re
>>> a='<p>dhhdhd<p>dhdhd</p>'
>>> re.sub('<p>|</p>','',a)
'dhhdhddhdhd'

A more general regular expression would be

re.sub('<p[^>]*>|</p>','',a)
rjv
  • 6,058
  • 5
  • 27
  • 49
0

You were close.

print english_hadith.text

Displays:

Ibn Umar reported: I passed by the Messenger of Allah, peace and blessings be upon him, while my garment was trailing. The Prophet said, “O Abdullah, raise your garment.” I lifted it up and he told me to raise it higher and I did so. Some of the people said, “To where should it be raised?” The Prophet said, “In the middle of the shins.”

John La Rooy
  • 295,403
  • 53
  • 369
  • 502