0

I am creating a database and inserting data. our backend engineer said he need a column to save whole articles with HTML format. But when I am inserting data it gives me an error like this:

enter image description here

and I check the exact where the error comes from, I found:

enter image description here

looks like this part has some quote or punctuation issues, and the same line occurs multiple times. And I use str() function to convert the formatted HTML text(use type() to see the datatype is bs4.element.Tag) to string, but the problem still exists.

My database description is:

('id', 'mediumint(9)', 'NO', 'PRI', None, 'auto_increment')
('weburl', 'varchar(200)', 'YES', '', None, '')
('picurl', 'varchar(200)', 'YES', '', None, '')
('headline', 'varchar(200)', 'YES', '', None, '')
('abstract', 'varchar(200)', 'YES', '', None, '')
('body', 'longtext', 'YES', '', None, '')
('formed', 'longtext', 'YES', '', None, '')
('term', 'varchar(50)', 'YES', '', None, '')

And the function I used to collect full text is:

def GetBody(url,plain=False):
    # Fetch the html file
    response = urllib.request.urlopen(url)
    html_doc = response.read()

    # Parse the html file
    soup = BeautifulSoup(html_doc, 'html.parser')

    #find the article body
    body = soup.find("section", {"name":"articleBody"})

    if not plain:
        return body
    else:
        text = ""
        for p_tag in body.find_all('p'):
            text = ' '.join([text,p_tag.text])
        return text

And I import the data by this function:

 def InsertDatabase(section):
        s = TopStoriesSearch(section)
            count1 = 0
        formed = []
        while count1 < len(s):
    #         tr = GetBody(s[count1]['url'])
    #         formed.append(str(tr))
    #         count1 = count1 + 1
(I use this to convert HTML to string, or use the code below)
              formed.append(GetBody(s[count1]['url']))
              count1 = count1 + 1

and this is my insert function:

for each in overall(I save everything in this list named overall):
          cur.execute('insert into topstories(formed) values("%s")' % (each["formed"]))

Any tips to solve the problem?

seki
  • 89
  • 1
  • 7
  • Can you try `cur.execute('insert into topstories(formed) values(%(formed)s)', each)`? – Mehmed Jun 22 '19 at 03:09
  • The observed behavior is a symptom of SQL Injection vulnerability. A value incorporated into the text of a SQL statement is *not* safe for inclusion. Best practice is to use prepared statements with bind placeholders. If that's not possible, values must be properly escaped. MySQL real_escape_string function is purpose built to perform the proper escaping. https://dev.mysql.com/doc/refman/5.7/en/mysql-real-escape-string.html mysql-python provides equivalent function https://stackoverflow.com/questions/2561178/python-equivalent-of-mysql-real-escape-string-for-getting-strings-safely-into-m – spencer7593 Jun 22 '19 at 04:34
  • The link to the stackoverflow question in the previous comment... please refer to the answer from Alex Martelli (and not just the accepted answer). – spencer7593 Jun 22 '19 at 04:40
  • @spencer7593 Hey, I use `MySQLdb.escape_string(each['formed'])` , but it shows `TypeError: a bytes-like object is required, not 'Tag'`. – seki Jun 22 '19 at 12:23
  • @Mehmed Looks like it still does not work... – seki Jun 22 '19 at 13:15

1 Answers1

0

The syntax of execute() function is as follows (link):

cursor.execute(operation, params=None, multi=False)

Therefore, you can provide values to be used in the query as an argument to the execute() function. In that case, it will handle values automatically, eliminating your problem:

import mysql.connector

cnx = mysql.connector.connect(...)
cur = cnx.cursor()
...
for each in overall:
    # If 'each' is a dictionary containing 'formed' as key,
    # i.e. each = {..., 'formed': ..., ...}, you can do as follows
    cur.execute('INSERT INTO topstories(formed) VALUES (%s)', (each['formed']))
    # You can also use dictionary directly if you use named placeholder in the query
    cur.execute('INSERT INTO topstories(formed) VALUES (%(formed)s)', each)
...
cnx.commit()
cnx.close()
Mehmed
  • 2,880
  • 4
  • 41
  • 62