5

Suppose I have the following table structure:

create table PEOPLE (
    ID integer not null primary key,
    NAME varchar(100) not null
);

create table CHILDREN (
    ID integer not null primary key,
    PARENT_ID_1 integer not null references PERSON (id),
    PARENT_ID_2 integer not null references PERSON (id)
);

and that I want to generate a list of the names of each person who is a parent. In slick I can write something like:

for {
  parent <- people
  child  <- children if {
    parent.id === child.parent_id_1 ||
    parent.id === child.parent_id_2
  }
} yield {
  parent.name
}

and this generates the expected SQL:

select p.name
from people p, children c
where p.id = c.parent_id_1 or p.id = c.parent_id_2

However, this is not optimal: the OR part of the expression can cause horrendously slow performance in some DBMSes, which end up doing full-table scans to join on p.id even though there's an index there (see for example this bug report for H2). The general problem is that the query planner can't know if it's faster to execute each side of the OR separately and join the results back together, or simply do a full table scan [2].

I'd like to generate SQL that looks something like this, which then can use the (primary key) index as expected:

select p.name
from people p, children c
where p.id in (c.parent_id_1, c.parent_id_2)

My question is: how can I do this in slick? The existing methods don't seem to offer a way:

What I'd like to be able to write is something like this:

for {
  parent <- people
  child  <- children
  if parent.id in (child.parent_id_1, child.parent_id_2)
} yield {
  p.name
}

but that's not possible right now.

[1] My actual design is a little more complex than this, but it boils down to the same problem.

[2] Some DBMSes do have this optimisation for simple cases, e.g. OR-expansion in Oracle.

Ashley Mercer
  • 2,058
  • 1
  • 16
  • 16
  • 1
    If you perform `explain` on the two queries (`OR` versus `IN`), say, in PostgreSQL, you'll see they are essentially the same (both executed with condition `id = c.parent_id_1 OR id = c.parent_id_2`). – Leo C Mar 02 '19 at 18:42
  • @LeoC it will depend massively on the DBMS of course, but also on the relative sizes of the tables, indexing strategies and so on - so without more information on what test data you were using it's hard to tell if your statement is accurate. My point is that, in general, I would expect `p.id in (..., ...)` to be *at least* as fast as `or`, but in many case it can be much faster. Therefore I'd prefer to use `in` where possible. – Ashley Mercer Mar 03 '19 at 15:02
  • This relevant [SO Q&A](https://stackoverflow.com/questions/3074713/in-vs-or-in-the-sql-where-clause) might be of interest. One of the answers suggests that if the column is indexed (as in your case), `IN` and `OR` would perform the same in MySQL. Given that RDBMSes may implement their query optimizer differently, I suppose it'd be best to do some profiling on the actual dataset, schema and RDBMS platform. – Leo C Mar 03 '19 at 17:01

1 Answers1

0

Turns out this isn't currently (as at slick 3.2.3) possible, so I've raised an issue on github and submitted a pull request to add this functionality.

Ashley Mercer
  • 2,058
  • 1
  • 16
  • 16