-1

I know (from the answers of this question) that Sqlite by default doesn't enable compression. Is it possible to enable it, or would this require another tool? Here is the situation:

I need to add millions of rows in a Sqlite database. The table contains a description column (~500 char on average), and on average, each description is shared by, say, 40 rows, like this:

id    name    othercolumn    description 
1     azefds  ...            This description will be the same for probably 40 rows
2     tsdyug  ...            This description will be the same for probably 40 rows
...
40    wxcqds  ...            This description will be the same for probably 40 rows
41    azeyui  ...            This one is unique
42    uiuotr  ...            This one will be shared by 60 rows
43    poipud  ...            This one will be shared by 60 rows
...
101   iuotyp  ...            This one will be shared by 60 rows
102   blaxwx  ...            Same description for the next 10 rows
103   sdhfjk  ...            Same description for the next 10 rows
...

Question:

  • Would you just insert rows like this, and enable a compression algorithm of the DB? Pro: you don't have to deal with 2 tables, it's easier when querying.

or

  • Would you use 2 tables?

    id    name    othercolumn    descriptionid
    1     azefds  ...            1
    2     tsdyug  ...            1    
    ...
    40    wxcqds  ...            1
    41    azeyui  ...            2
    ...
    
    id    description
    1     This description will be the same for probably 40 rows
    2     This one is unique
    

    Con: instead of the simple select id, name, description from mytable from solution #1, we have to use a complex way to retrieve this, involving 2 tables, and probably multiple queries? Or maybe is it possible to do it without a complex query, but with a clever query with union or merge or anything like this?

Basj
  • 41,386
  • 99
  • 383
  • 673

2 Answers2

2

Using multiple tables will not only prevent inconsistency, and take less space, but may also be faster, even if multiple/more complex queries are involved (precisely because it involves moving less data around). Which you should use depends on which of those characteristics are most important to you.

A query to retrieve the results when you have 2 tables would look something like this (which is really just a join between the two tables):

select table1.id, table1.name, table1.othercolumn, table2.description
from table1, table2
where table1.descriptionid=table2.id
Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
  • Thank you @ScottHunter. How do you think it's possible to get the equivalent of `select id, name, description from mytable` with solution #2? Is there a way with a `union` or `merge` or something like that, preventing many queries? I edited end of question to show this. – Basj Mar 20 '18 at 11:55
  • @Basj What about creating a view? – Dan Mašek Mar 20 '18 at 12:40
  • @DanMašek can you post an answer with a view? I'm not familiar with it. – Basj Mar 20 '18 at 12:50
  • @Basj [Here](https://pastebin.com/N8JYp61u) is a modified version of your script that uses a view. It's basically a virtual table, that hides the 2-table representation and let's you use a simpler query. – Dan Mašek Mar 21 '18 at 15:54
  • Could you post it as an answer @DanMašek? It would be useful for future reference to have it in SO. – Basj Mar 21 '18 at 17:09
0

Here is some illustration code in Python for ScottHunter's answer:

import sqlite3

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute("CREATE TABLE mytable (id integer, name text, descriptionid integer)")
c.execute("CREATE TABLE descriptiontable (id integer, description text)")

c.execute('INSERT INTO mytable VALUES(1, "abcdef", 1)');
c.execute('INSERT INTO mytable VALUES(2, "ghijkl", 1)');
c.execute('INSERT INTO mytable VALUES(3, "iovxcd", 2)');
c.execute('INSERT INTO mytable VALUES(4, "zuirur", 1)');
c.execute('INSERT INTO descriptiontable VALUES(1, "Description1")');
c.execute('INSERT INTO descriptiontable VALUES(2, "Description2")');

c.execute('SELECT mytable.id, mytable.name, descriptiontable.description FROM mytable, descriptiontable WHERE mytable.descriptionid=descriptiontable.id');

print c.fetchall()

#[(1, u'abcdef', u'Description1'),
# (2, u'ghijkl', u'Description1'), 
# (3, u'iovxcd', u'Description2'), 
# (4, u'zuirur', u'Description1')]
Basj
  • 41,386
  • 99
  • 383
  • 673