1

I have an SQLite database of the following format:

----------------------------
| id  |Type0 | Type1 | ...     
----------------------------
| id0 | 0-0  |  0-1  | ...
----------------------------
| id1 | 0-1  |  1-1  | ...
----------------------------
  • Currently I have o(10^8) rows.
  • Hence each time I create a new column I need to update o(10^8) values.
  • I do the following inside a long for loop:
value = f'{i}-{j}'
id_column = 'id'
id_value = f'id{i}'
update_sql = """ UPDATE {}  SET {} = '{}' where {} = '{}'""".format(table_name, column_name,value, id_column,id_value)
cur.execute(update_sql)
  • This takes around 30 min.
  • How can I take advantage of the generic nature of the entries to initialize new columns faster?
  • I can generate an np.array of shape (10,000,000,1) in less than 1s which has ordering. Why can't I generate a column in that amount of time with entries location in the table?
  • I read Improve INSERT-per-second performance of SQLite? and transactions seem a good way to speed up updates however I don't know how structure them for sqlite3.

UPDATE: 1. The following only takes 7s:

a = range(0,10000000)
b = [0]
c = [f'{x}-{y}' for x in a for y in b]
  1. The following only takes 2.84s where jit is the numba decorator:
@jit(nopython = True)
def go_fast(j):
    a = np.linspace(0,10000000,10000001)
    c = []
    for i in range(0, len(a)):
        c.append((a[i],j))
    return c
  1. If creating such objects in runtime is so fast, why can't I receive similar performance for SQL?

Looking for something like this now:

listing = fast_list.go_fast(j)
update_sql =" #Some statement to update the column to entries of listing"
cur.execute(update_sql)
  1. The following method takes only 185s, so an order of magnitude better.
  2. id2 is number of the row
UPDATE table_name SET column_name = id2+j
  1. This is much better but still not very fast, it will take 50h to do this 1000 times. Will probably also have to look into pairing functions more closely

  2. Fiddling with PRAGMA modes gets me down to 88.32s:

cur.execute('pragma journal_mode= WAL;')
  1. This brings it down to 32.33s:
cur.execute('pragma cache_size= 3000')
cur.execute(' pragma mmap_size = 256000000')
  1. Is there any reasonable way I can make this even faster?
sovann
  • 117
  • 1
  • 1
  • 12
  • 3
    "Hence each time I create a new column I need to update". This suggests a problem with your data model. In SQL databases, columns are not typically added in an ad-hoc fashion. – Gordon Linoff Mar 04 '20 at 13:22
  • Yes you are correct. I'm transposing all my operations. You can take the above question as being about creating/updating rows. – sovann Mar 04 '20 at 13:35

0 Answers0