I am parsing a saved HTML file using beautiful soup, find sample below, at first I thought that beautiful soup is truncating long lines, but apparently its the open function
<!DOCTYPE html>
<html dir="ltr" lang="en-GB">
<head>
<meta charset="utf-8" />
<title>The Title </title>
<meta property="type" content="website" />
<meta property="description" content="This is the text I want, but if its too long it gets truncated"/>
</head>
I want to get the text in the content tag where proprty=description, the code I wrote works fine but when the text in content is too long it gets truncated, I want to save the text in a variable, any ideas on how to avoid the truncation to save the whole text
def parse_page(file_path):
page = open(file_path)
soup = BeautifulSoup(page.read())
for line in page: #----> here when printing long lines are truncated thus problems when saving in variable answer
print(line)
soup = BeautifulSoup(fp, "html.parser")
answer=soup.find(property="description") #---->truncated output saved
print('answer--->',answer['content'],'type',type(answer)) #---> when printing its truncated
This is the code block that calls the function:
path='/content/HTMLpages'
os.chdir(path)
for file in os.listdir():
file_path = f"{path}/{file}"
parse_page(file_path)