I am creating a database and inserting data. our backend engineer said he need a column to save whole articles with HTML format. But when I am inserting data it gives me an error like this:
and I check the exact where the error comes from, I found:
looks like this part has some quote or punctuation issues, and the same line occurs multiple times. And I use str()
function to convert the formatted HTML text(use type()
to see the datatype is bs4.element.Tag
) to string, but the problem still exists.
My database description is:
('id', 'mediumint(9)', 'NO', 'PRI', None, 'auto_increment')
('weburl', 'varchar(200)', 'YES', '', None, '')
('picurl', 'varchar(200)', 'YES', '', None, '')
('headline', 'varchar(200)', 'YES', '', None, '')
('abstract', 'varchar(200)', 'YES', '', None, '')
('body', 'longtext', 'YES', '', None, '')
('formed', 'longtext', 'YES', '', None, '')
('term', 'varchar(50)', 'YES', '', None, '')
And the function I used to collect full text is:
def GetBody(url,plain=False):
# Fetch the html file
response = urllib.request.urlopen(url)
html_doc = response.read()
# Parse the html file
soup = BeautifulSoup(html_doc, 'html.parser')
#find the article body
body = soup.find("section", {"name":"articleBody"})
if not plain:
return body
else:
text = ""
for p_tag in body.find_all('p'):
text = ' '.join([text,p_tag.text])
return text
And I import the data by this function:
def InsertDatabase(section):
s = TopStoriesSearch(section)
count1 = 0
formed = []
while count1 < len(s):
# tr = GetBody(s[count1]['url'])
# formed.append(str(tr))
# count1 = count1 + 1
(I use this to convert HTML to string, or use the code below)
formed.append(GetBody(s[count1]['url']))
count1 = count1 + 1
and this is my insert function:
for each in overall(I save everything in this list named overall):
cur.execute('insert into topstories(formed) values("%s")' % (each["formed"]))
Any tips to solve the problem?