8

I cannot get my head around it. I want to insert the values of a dictionary into a sqlite databse.

url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5f...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url)) #soup it up
for data in soup.find_all('photo'): #parsing the data
    dict = { #filter the data, find_all creats dictionary KEY:VALUE
        "id_p": data.get('id'),
        "title_p": data.get('title'),
        "tags_p": data.get('tags'),
        "latitude_p": data.get('latitude'),
        "longitude_p": data.get('longitude'),
    }
    #print (dict)
    connector.execute("insert into DATAGERMANY values (?,?,?,?,?)", );
    connector.commit()

connector.close

My keys are id_p, title_p etc. and the values I retrieve through data.get.

However, I cannot insert them. When I try to write id, title, tags, latitude, longitude behind ...DATAGERMANY values (?,?,?,?,?)", ); I get NameError: name 'title' is not defined.

I tried it with dict.values and dict but then its saying table DATAGERMANY has 6 columns but 5 values were supplied.

Adding another ? gives me the error (with `dict.values): ValueError: parameters are of unsupported type

This is how I created the db and table.

#creating SQLite Database and Table
connector = sqlite3.connect("GERMANY.db") #create Database and Table, check if NOT NULL is a good idea
connector.execute('''CREATE TABLE DATAGERMANY
        (id_db INTEGER PRIMARY KEY AUTOINCREMENT,
        id_photo INTEGER NOT NULL,
        title TEXT,
        tags TEXT,
        latitude NUMERIC NOT NULL, 
        longitude NUMERIC NOT NULL);''')

The method should work even if there is no valueto fill in into the database... That can happen as well.

four-eyes
  • 10,740
  • 29
  • 111
  • 220

3 Answers3

11

You can use named parameters and insert all rows at once using executemany().

As a bonus, you would get a good separation of html-parsing and data-pipelining logic:

data = [{"id_p": photo.get('id'),
         "title_p": photo.get('title'),
         "tags_p": photo.get('tags'),
         "latitude_p": photo.get('latitude'),
         "longitude_p": photo.get('longitude')} for photo in soup.find_all('photo')]
connector.executemany("""
    INSERT INTO
        DATAGERMANY
        (id_photo, title, tags, latitude, longitude)
    VALUES
        (:id_p, :title_p, :tags_p, :latitude_p, :longitude_p)""", data)

Also, don't forget to actually call the close() method:

connector.close()

FYI, the complete code:

import sqlite3
from urllib2 import urlopen
from bs4 import BeautifulSoup

url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5f...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url))

connector = sqlite3.connect(":memory:")
cursor = connector.cursor()

cursor.execute('''CREATE TABLE DATAGERMANY
        (id_db INTEGER PRIMARY KEY AUTOINCREMENT,
        id_photo INTEGER NOT NULL,
        title TEXT,
        tags TEXT,
        latitude NUMERIC NOT NULL,
        longitude NUMERIC NOT NULL);''')

data = [{"id_p": photo.get('id'),
         "title_p": photo.get('title'),
         "tags_p": photo.get('tags'),
         "latitude_p": photo.get('latitude'),
         "longitude_p": photo.get('longitude')} for photo in soup.find_all('photo')]

cursor.executemany("""
    INSERT INTO
        DATAGERMANY
        (id_photo, title, tags, latitude, longitude)
    VALUES
        (:id_p, :title_p, :tags_p, :latitude_p, :longitude_p)""", data)

connector.commit()

cursor.close()
connector.close()
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks a lot. Would inserting all rows at once be faster than going through a list and filling the db row by row? – four-eyes Jul 31 '14 at 01:18
  • @Christoph there is definitely an overhead of multiple execute calls and hitting the database multiple times. – alecxe Jul 31 '14 at 01:23
  • sorry, I did not understand that... Overhead of multiple execute calls and hitting database multiple times means thats not good I guess? Another question regarding your code. Why do the script needs to loop again through `soupd.find_all`? – four-eyes Jul 31 '14 at 01:27
  • @Christoph yup, correct. There is only one loop over `find_all()`. The list of dictionaries is constructed which is passed to `executemany()` after. – alecxe Jul 31 '14 at 01:29
  • thanks for your reply. I accepted the other answere since I will loop through a list, containing maybe 1k or 2k ofs URLS. And filling the databse all at once with the extracted information might be a little bit too much. I learned a lot through your answere though! – four-eyes Jul 31 '14 at 16:19
  • @Christoph sure, glad to hear you solved the problem and also learned new things. This is what we are all here for. Thank you. The answer you've accepted is pretty good and detailed. – alecxe Jul 31 '14 at 16:22
  • @alecxe how can I then save the database to a local path? – Nemra Khalil Nov 08 '22 at 13:02
7

As written, your connector.execute() statement is missing the parameters argument.

It should be used like this:

connector.execute("insert into some_time values (?, ?)", ["question_mark_1", "question_mark_2"])

Unless you need the dictionary for later, I would actually use a list or tuple instead:

row = [
  data.get('id'),
  data.get('title'),
  data.get('tags'),
  data.get('latitude'),
  data.get('longitude'),
]

Then your insert statement becomes:

connector.execute("insert into DATAGERMANY values (NULL,?,?,?,?,?)", *row)

Why these changes?

  • The NULL in the values (NULL, ...) is so the auto-incrementing primary key will work
  • The list instead of the dictionary because order is important, and dictionaries don't preserve order
  • The *row so the five-element row variable will be expanded (see here for details).
  • Lastly, you shouldn't use dict as a variable name, since that's a built-in variable in Python.
Community
  • 1
  • 1
bbayles
  • 4,389
  • 1
  • 26
  • 34
1

If you're using Python 3.6 or above, you can do this for dicts:

dict_data = {
 'filename' : 'test.txt',
 'size' : '200'
}
table_name = 'test_table'
attrib_names = ", ".join(dict_data.keys())
attrib_values = ", ".join("?" * len(dict_data.keys()))
sql = f"INSERT INTO {table_name} ({attrib_names}) VALUES ({attrib_values})"
cursor.execute(sql, list(dict_data.values()))
nejinx
  • 349
  • 2
  • 6