1

I am scraping a website and store the result in a nested dictionary. The dictionary has the same structure as my database. My aim is to write a function with one parameter, which holds the table name and inserts the data from the dictionary into that table.

I have the following code

url = requests.get("http://www.randomurl.com")
data = url.text
soup = BeautifulSoup(data, "html5lib")

cnx = pymysql.connect(host='localhost',
                  user='root',
                  password='',
                  database='mydb')

cursor = cnx.cursor()

band = {    
            "band_info":    {
                            "band_name" : soup.find('h1', {'class': 'band_name'}).get_text(),
                            "band_logo" : soup.find('a', {'id': 'logo'})['href'],
                            "band_img" : soup.find('a', {'id': 'photo'})['href'],
                            "band_comment" : soup2.find('body').get_text().replace('\r', '').replace('\n', '').replace('\t', '').strip()
                            },
            "countries":    {
                            "country" : "value",
                            },
            "locations":    {
                            "location" : "value",
                            },
            "status":       {
                            "status_name" : "value",
                            },
            "formedin":     {
                            "formed_year" : "value",
                            },
            "genres":       {
                            "genre_name" : ["value","value","value"]
                            },
            "lyricalthemes":{
                            "theme_name" : ["value","value","value"]
                            },
            "labels":       {
                            "label_name" : ["value","value","value"]
                            },
            "activeyears":  {
                            "active_year" : "value"
                            },
            "discography":  {
                            "album_name" : ["value","value","value"]
                            },
            "artists":      {
                            "artist_name" : ["value","value","value"]
                            }
        }

def insertData(table):
    placeholders = ', '.join(['%s'] * len(band[table]))
    columns = ', '.join(band[table].keys())
    values = band[table].values()
    sql = "INSERT INTO %s ( %s ) VALUES ( %s )" % (table, columns, placeholders)
    print(sql)
    cursor.execute(sql, values)


insertData("band_info")

cursor.close()
cnx.close()

The first keys inside the dictionary "band" are named like the tables on my database. The nested keys are columns inside that table. The function i wrote shall depending on the parameter it gets insert the correct values.

I get this error:

Traceback (most recent call last):
File "parser.py", line 144, in <module>
insertData("band_info")
File "parser.py", line 141, in insertData
cursor.execute(sql, values)
File "\Python\Python36-32\lib\site-packages\pymysql\cursors.py", line 164, in execute
query = self.mogrify(query, args)
File "\Python\Python36-32\lib\site-packages\pymysql\cursors.py", line 143, in mogrify
query = query % self._escape_args(args, conn)
File "\Python\Python36-32\lib\site-packages\pymysql\cursors.py", line 129, in _escape_args
return conn.escape(args)
File "\Python\Python36-32\lib\site-packages\pymysql\connections.py", line 814, in escape
return escape_item(obj, self.charset, mapping=mapping)
File "\Python\Python36-32\lib\site-packages\pymysql\converters.py", line 27, in escape_item
val = encoder(val, mapping)
File "\Python\Python36-32\lib\site-packages\pymysql\converters.py", line 110, in escape_unicode
return u"'%s'" % _escape_unicode(value)
File "\Python\Python36-32\lib\site-packages\pymysql\converters.py", line 73, in _escape_unicode
return value.translate(_escape_table)
AttributeError: 'dict_values' object has no attribute 'translate'

and i am a bit lost with this. I took this as reference for my code.

My questions is, do i need some kind of text encoding on the beautifulsoup result to store it in the database? And if not, how can i insert the data to my mysql database correctly?

I have further questions to same topic.

My next step is to insert relations in other tables. I simply just try to execute this code:

for i in band["artists"]["artist_name"]:
    cursor.execute("""INSERT INTO `band_artists` ( `id_aband` , `id_aartist` ) VALUES (
                (SELECT  `id_band`  from `band_info` WHERE `band_name` = ? AND WHERE band_logo = ? ),
                (SELECT  `id_art`  from `artists` WHERE `artist_name` = ? ) )""",(band["band_info"]["band_name"], band["band_info"]["band_logo"], i))
    cnx.commit()

I get a very similar error code but i can't figure out what is wrong with the datatype:

query = query % self._escape_args(args, conn)
TypeError: not all arguments converted during string formatting

I tried to write list(values, values) as mentioned before, i get same error.

kratze
  • 186
  • 2
  • 11
  • You're passing a wrong datatype in `execute()`, can you provide the correct URL so I can check your code logic. Thanks! – chad Sep 04 '17 at 15:39
  • Hello chad, thank you for your reply. I am trying to scrape the following url: https://www.metal-archives.com/bands/A_B_I_S_M_O/3540410023 – kratze Sep 04 '17 at 15:55

1 Answers1

1

The issue is that you're passing dict_values on the second argument of execute(), value only accepts either a tuple, list or dict. You can try this:

def insertData(table):
    placeholders = ', '.join(['%s'] * len(band[table]))
    columns = ', '.join(band[table].keys())
    values = list(band[table].values()) # I edited this part
    sql = "INSERT INTO %s ( %s ) VALUES ( %s )" % (table, columns, placeholders)
    print(sql)
    cursor.execute(sql, values)
chad
  • 838
  • 1
  • 5
  • 16
  • Great catch! Thanks! – chad Sep 05 '17 at 09:24
  • I updated my question because i get same error when i continue my next next. Inserting relations into another table. You can assist further? – kratze Sep 06 '17 at 11:46
  • `?` should be `%s` – chad Sep 06 '17 at 11:55
  • great, and now it says "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'WHERE band_logo = 'https://www.metal-archives.com/images/3/5/4/0/3540410023_logo' at line 2" – kratze Sep 06 '17 at 12:25
  • Because `WHERE band_name = ? AND WHERE band_logo = ?` You have 2 **WHERE** that should be `WHERE band_name = ? AND band_logo = ?` – chad Sep 06 '17 at 12:31