Looking for a surogate for arrays in SQLLite , thru SQLAlchemy

Question

I'm using SQLlite wit SQLAlchemy and I have values in database like:

id name    value
 1  ionic   "alpha, beta , gama, teta"

I need to do a select query by name and values, using as params:

name : ionic
value: "beta, gama"

So if the name corresponds, any of the value(s) pass as params are found in the value, return the id. But if the values is "alpha, ronda" to fail.

I'm forced to use SQLLite, and I know there are no arrays in SQLlite, so what are the options.

Schwern · Answer 1 · 2020-01-24T21:03:03.820

In traditional SQL, lists are done as join tables. In this case a one-to-many relationship. One thing can have many values.

create table things (
  id serial primary key,
  name text not null
);

create table thing_values (
  thing_id references things(id),
  value text not null,
  unique(value, thing_id)
);

You then join them together to get the values for each thing.

select *
from things t
join thing_values tv on t.id = tv.thing_id;

To find which things match all values, you need to search for all rows which match any of the values, and then count how many rows matched per thing.

select thing_id as id
from thing_values
where value in ('beta', 'gamma')
group by thing_id
having count(*) = 2

In your case you looked for 2 values, so you only want things which match twice. Since we have a unique clause on thing_values there can be no duplicates. If there were, you'd have to use having count(distinct value) = 2.

See this answer for more detail.

Finally, this can be joined together with a CTE to get the names of the matching things.

with thing_matches as (
  select thing_id as id
  from thing_values
  where value in ('beta', 'gamma')
  group by thing_id
  having count(*) = 2
)
select t.name
from things t
join thing_matches tm on tm.id = t.id

dbfiddle to demonstrate.

This might seem like a lot more work than just storing, say, a comma separated field. And it is, but only in the short run. In the long run as you come to understand SQL's relational view of the world it will become more natural to organize things this way, and also much more efficient.

Searching through a comma-separated list requires the database to scan every row for every query which wants to find a thing with a given value. This is a full table scan and they are the bane of database performance. (Other databases have more advanced indexes which can index this sort of search, but AFAIK SQLite does not).

Instead, where value in ('beta', 'gamma') is a simple equality check and can be done on the index we set up with unique(value, thing_id). Unique constraints are done via an index, and by putting value first SQLite can use this unique index to search by value.

So the above query will perform very quickly on a large table. Whereas a like query must scan every row of the table.

Finally, if you do choose to do the comma delimited thing, you have to be careful not to accidentally match partially. Consider what happens when values have overlap.

The naive approach would be something like...

1   ionic   alpha,beta,gamma,theta
2   doric   al,bet,gams,the

where value like '%beta%'  # it matches 1

where value like '%bet%'   # whoopsie, it matches both

Instead, clearly delineate your values with leading and trailing commas. This eliminates the edge cases around the start and end so the matches can be curtailed by including commas.

1   ionic   ,alpha,beta,gamma,theta,
2   doric   ,al,bet,gams,the,

where value like '%,beta,%'  # it matches 1

where value like '%,bet,%'     # it matches 2

These sorts of edge cases are another reason why a join table is worth the trouble.

I didn't used join because I have like 100 attributes, each attribute with its own values, so having 100 separated possible combination of joins didn't look that good. — user3541631, Jan 26 '20 at 11:54
@user3541631 100 columns in a table will cause more problems. In that case you can have a single key/value table. Store and index the key as well as the value in `thing_values`. Then you can query `where key = 'foo' and value = 'bar'`. — Schwern, Jan 26 '20 at 17:38

Looking for a surogate for arrays in SQLLite , thru SQLAlchemy

1 Answers1