1

I have a database, which I store as a .db file on my disk. I implemented all the function neccessary for managing this database using sqlite3. However, I noticed that updating the rows in the table takes a large amount of time. My database has currently 608042 rows. The database has one table - let's call it Table1. This table consists of the following columns:

id | name | age | address | job | phone | income

(id value is generated automaticaly while a row is inserted to the database). After reading-in all the rows I perform some operations (ML algorithms for predicting the income) on the values from the rows, and next I have to update (for each row) the value of income (thus, for each one from 608042 rows I perform the SQL update operation). In order to update, I'm using the following function (copied from my class):

def update_row(self, new_value, idkey):
    update_query = "UPDATE Table1 SET income = ? WHERE name = ?" % 
    self.cursor.execute(update_query, (new_value, idkey))
    self.db.commit()

And I call this function for each person registered in the database.

for each i out of 608042 rows:
  update_row(new_income_i, i.name)

(values of new_income_i are different for each i). This takes a huge amount of time, even though the dataset is not giant. Is there any way to speed up the updating of the database? Should I use something else than sqlite3? Or should I instead of storing the database as a .db file store it in memory (using sqlite3.connect(":memory:"))?

Ziva
  • 3,181
  • 15
  • 48
  • 80
  • 1
    Have you tried to create index on column `name`, or even use `id` to update those records? If `name` is unique to each row, you can even try to create an unique index on column `name`, but I believe using `id` should be fast, since it is numeric other than string. – Joey Zhang Jul 29 '17 at 00:03
  • @JoeyZhang Id is unique, names are not necessarily. Could you please tell a little bit more what this index is suppose to do and how it can help? – Ziva Jul 29 '17 at 00:05
  • 1
    see "How does database indexing work?" https://stackoverflow.com/questions/1108/how-does-database-indexing-work – Joey Zhang Jul 29 '17 at 00:14
  • Consider using a database that supports stored procedures so that you can implement your ML algorithm as a stored procedure that can be applied in a single UPDATE statement, rather than having to move all the data to Python and then the results back one row at a time. – rd_nielsen Jul 29 '17 at 01:01
  • @rd_nielsen Can you maybe suggest any databases that support stored procedures or any tutorial/blog post where I can read about this? – Ziva Jul 29 '17 at 01:31
  • @rd_nielsen SQLite is an embedded database; it does not have client/server communication overhead. – CL. Jul 29 '17 at 07:24

1 Answers1

2

Each UPDATE statement must scan the entire table to find any row(s) that match the name.

An index on the name column would prevent this and make the search much faster. (See Query Planning and How does database indexing work?)

However, if the name column is not unique, then that value is not even suitable to find individual rows: each update with a duplicate name would modify all rows with the same name. So you should use the id column to identify the row to be updated; and as the primary key, this column already has an implicit index.

CL.
  • 173,858
  • 17
  • 217
  • 259