1

Edit: Added resolution following the original question. The problem turned out to be related to aggregate functions in temporary tables rather than parameter markers with inner joins. A more detailed explanation is in the resolution.

This question is not a duplicate. The problem is unrelated to the presence of an IN clause, which I have illustrated by eliminating the IN clause from the queries to illustrate that the defective behavior remains independent of the IN clause.

There appears to be some confusion about where exactly the problem is in my question below. The problem is not in the IN clause. The problem is that the parameter specified with pheno_temp1.pheno_count=? is not being replaced.

To address this misreading of the question, I rewrote the two rawQuery calls to illustrate that the behavior is in the parameter replacement for the inner join. There is no IN clause in the following calls, and they exhibit exactly the same behavior as specified in the original two calls (which I have retained for reference).

// More restrictive example. This call gives the correct result;
// note there are NO parameter markers.
cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=1",null);

// This call returns NO results. pheno_count=1 has been replaced with
// pheno_count=?
cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=?",new String[]{"1"});

// This query still gives the correct result; It includes a parameter
// marker for a column from the organism table.
cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=1 AND organism.bbch_stage=?",new String[]{"9"});

// This query still returns no rows. It includes 2 parameter markers;
// one for a column from organism (see above call which worked) and one
// for pheno_count from the inner table.
cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=? AND organism.bbch_stage=?",new String[]{"1","9"});

I am observing behavior when executing rawQuery that suggests it is bugged when using selectionArgs to pass values for a where clause on a table that is specified in an inner join.

I have the following tables in a SQLite database:

table organism
org_id  bbch_stage  org_name                            (other data)
-----------------------------------------------------------------------
     0           9  homozygous dominant starter plant   (miscellaneous)
     1           9  homozygous recessive starter plant  (miscellaneous)

table pheno_temp1
org_id   pheno_count
--------------------
     0             1

The following calls return different result sets:

// This call returns the correct result set
cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=1 AND organism.bbch_stage IN (?)",new String[]{"9"});

// This call returns no rows
cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=? AND organism.bbch_stage IN (?)",new String[]{"1","9"});

This appears to be aberrant behavior to me. However, before I try to file a bug report I'd like to know if there is some subtle behavior of rawQuery with inner joins and selectionArgs that is not apparent in the documentation.

I am observing this behavior on an HTC One running Android 4.1.1. Development environment is Android Studio 2.1.1. Compile SDK 23, build tools 23.0.3, min SDK 14, target SDK 23. Java 1.8.0_91.

Background:

The queries above are the simplest queries illustrating the problem that might occur in the app. Real queries might arbitrarily include organism restrictions on maternal and paternal lineage, organism "birth" date (before, after, on, or between two dates), or multiple bbch stages; specification of multiple phenotypes, or specification of multiple alleles from a genetic table (omitted in the example above). The arbitrary nature of the restrictions make a view unsuitable. Instead I build work tables for the phenotype and allele restrictions (if needed), and then use them in an inner join to restrict the rows selected from the organism table (again, if needed).

The restrictions on the phenotypes and alleles are set by building temporary tables which are then joined to the organism table. If the user has set restrictions on phenotype, the pheno_temp1 table will specify (in pheno_count) how many of those restrictions were met for each organism. So if the user set three phenotype restrictions, only entries in pheno_temp1 with pheno_count=3 will satisfy the user's criteria. In the inner join, the where clause on pheno_count must accommodate a varying value. A similar (but significantly more complex) mechanism is used for restricting results by alleles.

There are ways for me to code around the rawQuery behavior without concatenating the count value into the where clause (that involve more steps with the temp tables), so I am not looking for a solution to that problem. What I am looking for is confirmation the behavior is bugged before I open a bug report.

Discussions for more general "How do I do an inner join" questions on elsewhere on this site suggest that the second query (the one that returns no rows) should work, but all the answers I have seen say something like "This is an example ...," not "This is code I tested and know for certain works." (... except for maybe this question SQLite rawquery selectionArgs not working).

Community
  • 1
  • 1
Eliodorus
  • 109
  • 8
  • @CL, Please link to the exact duplicate since I believe you are wrong. I spent several hours examining dozens of questions here and none of them are asking if rawQuery is bugged in its behavior with selectionArgs used with inner join parameters. – Eliodorus May 29 '16 at 15:40
  • @CL, I eliminated the IN clause from the example calls to illustrate that the problem has nothing to do with the IN clause. The rawQuery call exhibits the same behavior independent of the IN clause. – Eliodorus May 29 '16 at 16:18
  • @CL, this works `cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=1",null);` (no parameter markers). This fails: `cursor = _mendelDatabase.rawQuery("SELECT organism.org_id FROM organism INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id WHERE pheno_temp1.pheno_count=?",new String[]{"1"});` (one parameter marker on inner join). Not related to IN clause. Can't get much clearer than that. – Eliodorus May 29 '16 at 19:24
  • It's also a duplicate of [Android Sqlite selection args with int values](http://stackoverflow.com/q/18746149/11654). – CL. May 29 '16 at 20:33
  • @CL, How is this question a duplicate of [Android Sqlite selection args with int values](http://stackoverflow.com/q/18746149/11654)? I see no INNER JOIN in that question. This question is entirely about parameter markers on a column from an inner join table. I have added examples (from further testing about an hour ago) that further restrict the queries, to emphasize where the error is. I have no problem making a query work with parameters under other circumstances (and that explicitly includes the case where I use an IN clause with multiple parameter markers - that works fine). – Eliodorus May 29 '16 at 21:00
  • Your problem has nothing to do with the join; it's the wrong parameter type. – CL. May 29 '16 at 21:11
  • @CL, suggestion about parameter type seems to point in right direction, but it is not whole story. A pair of tests show this fails: `Cursor trace = _mendelDatabase.query("pheno_temp1",new String[]{"org_id","pheno_count"},"pheno_count=?",new String[]{"1"},null,null,null,null);` while this succeeds: `Cursor trace = _mendelDatabase.query("organism",new String[]{"org_id","org_name"},"org_id=?",new String[]{"0"},null,null,null,null);`. org_id is defined thus: `org_id INT NOT NULL,`; suggesting a difference between computed column (count,sum) vs a defined column. Running a further test now. – Eliodorus May 29 '16 at 21:50

1 Answers1

1

Resolution

In the hope of helping others who land on this question when working on a similar problem, here is what I determined. Every query that was failing (returning 0 rows) had two things in common.

  1. All the queries involved an inner join in which a column in the joined table was to be used to restrict the returned rows (via inner_table.column=? with selectionArgs used to provide the replacement value);
  2. Every inner table column that was to be used to restrict the returned rows was the result of an aggregate function of some kind (count, sum).

The fact that all failing queries had an inner join blinded me to the fact that all the columns for which parameter replacement was failing were aggregate functions.

These are the SQL commands used to define and populate the tables involved in the queries:

CREATE TABLE IF NOT EXISTS `organism` (
  `org_id` INT NOT NULL,
  `org_name` INT NULL,
  `mat_id` INT NULL,
  `pat_id` INT NULL,
  `map_id` INT NOT NULL,
  `bbch_stage` INT NULL,
  -- other columns not relevant to problem
  PRIMARY KEY (`org_id`),
  -- constraints not relevant to problem
);
-- indexes not relevant to problem

CREATE TABLE IF NOT EXISTS `gene_sequence` (
  `org_id` INT NOT NULL,
  `gene_id` INT NOT NULL,
  `allele_id` INT NOT NULL,
  -- constraints not relevant to problem
);
-- indexes not relevant to problem

CREATE TABLE IF NOT EXISTS `organism_phenotype` (
  `org_id` INT NOT NULL,
  `map_id` INT NOT NULL,
  `phenogroup_id` INT NOT NULL,
  `phenotype_id` INT NOT NULL,
  -- constraints not relevant to problem
);
-- indexes not relevant to problem

INSERT INTO `organism`
   (`org_id`, `org_name`, `mat_id`, `pat_id`,  `map_id`, `bbch_stage` /* other columns omitted for brevity */)
   VALUES (0, "homozygous dominant starter plant", null, null, 0, 9 /* other columns omitted for brevity */);
-- Yes, gene_sequence contains duplicate records; this is valid for the
-- problem domain space 
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 0, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 0, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 1, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 1, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 2, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 2, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 3, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 3, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 4, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 4, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 5, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 5, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 6, 0);
INSERT INTO gene_sequence (`org_id`,`gene_id`,`allele_id`) VALUES (0, 6, 0);
INSERT INTO `organism_phenotype` (`org_id`,`map_id`,`phenogroup_id`,`phenotype_id`) VALUES (0, 0, 0, 0);
INSERT INTO `organism_phenotype` (`org_id`,`map_id`,`phenogroup_id`,`phenotype_id`) VALUES (0, 0, 1, 0);
INSERT INTO `organism_phenotype` (`org_id`,`map_id`,`phenogroup_id`,`phenotype_id`) VALUES (0, 0, 2, 0);
INSERT INTO `organism_phenotype` (`org_id`,`map_id`,`phenogroup_id`,`phenotype_id`) VALUES (0, 0, 3, 0);
INSERT INTO `organism_phenotype` (`org_id`,`map_id`,`phenogroup_id`,`phenotype_id`) VALUES (0, 0, 4, 0);
INSERT INTO `organism_phenotype` (`org_id`,`map_id`,`phenogroup_id`,`phenotype_id`) VALUES (0, 0, 5, 0);
INSERT INTO `organism_phenotype` (`org_id`,`map_id`,`phenogroup_id`,`phenotype_id`) VALUES (0, 0, 6, 0);

When searching for an organism from the database, the user may specify one or more phenotype values or alleles the organism must exhibit/possess. These restrictions are applied by building temporary tables based on the specified conditions. For phenotype, the generated query string looks like:

CREATE TEMP TABLE pheno_temp1 AS
  SELECT org_id,count(org_id) AS 'pheno_count'
    FROM organism_phenotype
    WHERE (phenogroup_id=? AND phenotype_id=?)
      -- the OR clause appears optionally for each phenotype specified
      -- beyond the first
      OR (phenogroup_id=? AND phenotype_id=?)
    GROUP BY org_id

The string is passed in to rawQuery with a String[] specifying the replacement values for the parameters.

So, if the user desires a list of all pea plants exhibiting purple flowers (phenogroup=0,phenotype=0) and axial flower placement (phenogroup=1,phenotype=0) the software will define the query string and selectionArgs as follows:

String queryString =
  "CREATE TEMP TABLE pheno_temp1 AS" +
    " SELECT org_id,count(org_id) AS 'pheno_count'" +
      " FROM organism_phenotype" +
      " WHERE (phenogroup_id=? AND phenotype_id=?)" +
        " OR (phenogroup_id=? AND phenotype_id=?)" +
      " GROUP BY org_id";
String[] selectionArgs = {"0","0","1","0"};

This works (mostly) as expected and creates a table with the expected contents:

table pheno_temp1
org_id   pheno_count
--------------------
     0             2

The process for filtering on alleles is similar but involves two levels of aggregation and a DISTINCT operation (and since the underlying problem is the same, I omit those details here).

After building the temporary tables, the next step of the process is to select org_id values meeting all the user's criteria by joining organism to pheno_temp1 and geno_temp2 (the final work table from allele filtering). The query for this step would typically look as follows:

SELECT organism.org_id FROM organism
  INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id
  INNER JOIN geno_temp2 ON organism.org_id=geno_temp2.org_id
    WHERE
      -- the query will include one or more of these clauses depending
      -- on the user's selections; the software takes care of adding the
      -- appropriate clauses and necessary glue (AND) between clauses
      pheno_temp1.pheno_count=?
      AND geno_temp2.allele_sum=?
      -- the query will include at most one of the birth_date clauses
      AND organism.birth_date<?
      --AND organism.birth_date=?
      --AND organism.birth_date>?
      --AND organism.birth_date BETWEEN ? AND ?
      -- the bbch_stage clause includes one parameter marker for each
      -- bbch stage selected by the user
      AND organism.bbch_stage IN (?,?,?)
      -- maternal and paternal lineage
      AND organism.mat_id=?
      AND organism.pat_id=?
    ORDER BY organism.org_id

If the user has requested a single phenotype (so phenocount = 1) and a single bbch stage (9), the sql query string and selection arguments would be defined as follows:

String queryString =
  "SELECT organism.org_id FROM organism" +
    " INNER JOIN pheno_temp1 ON organism.org_id=pheno_temp1.org_id" +
      " WHERE pheno_temp1.pheno_count=?" +
        " AND organism.bbch_stage IN (?)" +
      " ORDER BY organism.org_id";
String[] selectionArgs = {"1","9"};

The SQL query and the selection arguments are both correct; the data in organism and pheno_temp1 suggest that the rawQuery call should return a single row with org_id=0 (well strictly speaking a cursor on a result set containing a single row for the organism record with org_id=0, but I'm going to assume from this point forward that everyone reading this knows what I mean).

Here's the problem: the query returns no rows. For some reason the aggregate columns from the CREATE TABLE AS SELECT queries cannot be used with parameter markers in rawQuery() (and in fact my testing showed they cannot be used in query() either).

The solution turns out to be changing the way the temporary tables are built for the phenotype and allele filtering. Instead of using a CREATE TABLE AS SELECT statement, the table is explicitly created in an earlier step:

_mendelDatabase.execSQL("CREATE TEMP TABLE pheno_temp1 ('org_id' INT NOT NULL,'pheno_count' INT NOT NULL");

The query string to populate this table is then changed to:

String queryString =
  "INSERT INTO pheno_temp1 (org_id,pheno_count)" +
    " SELECT org_id,count(org_id) AS 'pheno_count'" +
      " FROM organism_phenotype" +
      " WHERE (phenogroup_id=? AND phenotype_id=?)" +
        " OR (phenogroup_id=? AND phenotype_id=?)" +
      " GROUP BY org_id";

(No change is necessary to selectionArgs.)

One final observation: many (not all) of the questions about parameter markers and selection arguments imply that they may only be used when the underlying table columns are string types. This is demonstrably not true; all the code above is pulled directly from the app where I uncovered this problem and most of the columns used with parameter replacement are integer types; I attest that with the change made to how the temporary table is created, the code works properly.

I'd like to acknowledge CL's time and effort. Although CL did not provide the actual resolution to the problem, the back-and-forth did eventually lead to a test case that pointed in direction of the answer (and in fairness to CL the original question did not state that the inner join columns were from aggregate functions).

Eliodorus
  • 109
  • 8