I am scraping a website and store the result in a nested dictionary. The dictionary has the same structure as my database. My aim is to write a function with one parameter, which holds the table name and inserts the data from the dictionary into that table.
I have the following code
url = requests.get("http://www.randomurl.com")
data = url.text
soup = BeautifulSoup(data, "html5lib")
cnx = pymysql.connect(host='localhost',
user='root',
password='',
database='mydb')
cursor = cnx.cursor()
band = {
"band_info": {
"band_name" : soup.find('h1', {'class': 'band_name'}).get_text(),
"band_logo" : soup.find('a', {'id': 'logo'})['href'],
"band_img" : soup.find('a', {'id': 'photo'})['href'],
"band_comment" : soup2.find('body').get_text().replace('\r', '').replace('\n', '').replace('\t', '').strip()
},
"countries": {
"country" : "value",
},
"locations": {
"location" : "value",
},
"status": {
"status_name" : "value",
},
"formedin": {
"formed_year" : "value",
},
"genres": {
"genre_name" : ["value","value","value"]
},
"lyricalthemes":{
"theme_name" : ["value","value","value"]
},
"labels": {
"label_name" : ["value","value","value"]
},
"activeyears": {
"active_year" : "value"
},
"discography": {
"album_name" : ["value","value","value"]
},
"artists": {
"artist_name" : ["value","value","value"]
}
}
def insertData(table):
placeholders = ', '.join(['%s'] * len(band[table]))
columns = ', '.join(band[table].keys())
values = band[table].values()
sql = "INSERT INTO %s ( %s ) VALUES ( %s )" % (table, columns, placeholders)
print(sql)
cursor.execute(sql, values)
insertData("band_info")
cursor.close()
cnx.close()
The first keys inside the dictionary "band" are named like the tables on my database. The nested keys are columns inside that table. The function i wrote shall depending on the parameter it gets insert the correct values.
I get this error:
Traceback (most recent call last):
File "parser.py", line 144, in <module>
insertData("band_info")
File "parser.py", line 141, in insertData
cursor.execute(sql, values)
File "\Python\Python36-32\lib\site-packages\pymysql\cursors.py", line 164, in execute
query = self.mogrify(query, args)
File "\Python\Python36-32\lib\site-packages\pymysql\cursors.py", line 143, in mogrify
query = query % self._escape_args(args, conn)
File "\Python\Python36-32\lib\site-packages\pymysql\cursors.py", line 129, in _escape_args
return conn.escape(args)
File "\Python\Python36-32\lib\site-packages\pymysql\connections.py", line 814, in escape
return escape_item(obj, self.charset, mapping=mapping)
File "\Python\Python36-32\lib\site-packages\pymysql\converters.py", line 27, in escape_item
val = encoder(val, mapping)
File "\Python\Python36-32\lib\site-packages\pymysql\converters.py", line 110, in escape_unicode
return u"'%s'" % _escape_unicode(value)
File "\Python\Python36-32\lib\site-packages\pymysql\converters.py", line 73, in _escape_unicode
return value.translate(_escape_table)
AttributeError: 'dict_values' object has no attribute 'translate'
and i am a bit lost with this. I took this as reference for my code.
My questions is, do i need some kind of text encoding on the beautifulsoup result to store it in the database? And if not, how can i insert the data to my mysql database correctly?
I have further questions to same topic.
My next step is to insert relations in other tables. I simply just try to execute this code:
for i in band["artists"]["artist_name"]:
cursor.execute("""INSERT INTO `band_artists` ( `id_aband` , `id_aartist` ) VALUES (
(SELECT `id_band` from `band_info` WHERE `band_name` = ? AND WHERE band_logo = ? ),
(SELECT `id_art` from `artists` WHERE `artist_name` = ? ) )""",(band["band_info"]["band_name"], band["band_info"]["band_logo"], i))
cnx.commit()
I get a very similar error code but i can't figure out what is wrong with the datatype:
query = query % self._escape_args(args, conn)
TypeError: not all arguments converted during string formatting
I tried to write list(values, values) as mentioned before, i get same error.