I have a multiple text files which have been used to store source pages from a website. So each text file is a source page.
I need to extract text from a div class stored in the text file using the following code:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("zing.internet.accelerator.plus.txt"))
txt = soup.find('div' , attrs = { 'class' : 'id-app-orig-desc' }).text
print txt
I have checked type of my soup object to make sure It is not using string find method while looking for the div class. Type of soup object
print type(soup)
<class 'bs4.BeautifulSoup'>
I have already taken reference from one of the previous post, and written open statement inside beautifulsoup statement.
Error:
Traceback (most recent call last):
File "html_desc_cleaning.py", line 13, in <module>
txt2 = soup.find('div' , attrs = { 'class' : 'id-app-orig-desc' }).text
AttributeError: 'NoneType' object has no attribute 'text'
Source from the page: