6

I recently had to wrote a query to filter some specific data that looked like the following:

Let's suppose that I have 3 distinct values that I want to search in 3 different fields of one of my tables on my database, they must be searched in all possible orders without repetition.

Here is an example (to make it easy to understand, I will use named queries notation to show where the values must be placed):

val1 = "a", val2 = "b", val3 = "c"

This is the query I've generated:

SELECT * FROM table WHERE
(fieldA = :val1 AND fieldB = :val2 AND fieldC = :val3) OR
(fieldA = :val1 AND fieldB = :val3 AND fieldC = :val2) OR
(fieldA = :val2 AND fieldB = :val1 AND fieldC = :val3) OR
(fieldA = :val2 AND fieldB = :val3 AND fieldC = :val1) OR
(fieldA = :val3 AND fieldB = :val1 AND fieldC = :val2) OR
(fieldA = :val3 AND fieldB = :val2 AND fieldC = :val1)

What I had to do is generate a query that simulates a permutation without repetition. Is there a better way to do this type of query?

This is OK for 3x3 but if I need to do the same with something bigger like 9x9 then generating the query will be a huge mess.

I'm using MariaDB, but I'm okay accepting answers that can run on PostgreSQL. (I want to learn if there is a smart way of writing this type of queries without "brute force")

Gabriel Mazetto
  • 1,090
  • 1
  • 12
  • 23

5 Answers5

8

There isn't a much better way, but you can use in:

SELECT *
FROM table
WHERE :val1 in (fieldA, fieldB, fieldC) and
      :val2 in (fieldA, fieldB, fieldC) and
      :val3 in (fieldA, fieldB, fieldC)

It is shorter at least. And, this is standard SQL, so it should work in any database.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • this would allow fieldA and fieldB to get satisfied by :val1 when only 1 val is allowed to satisfy 1 field at a time. – T McKeown Mar 05 '14 at 20:16
  • @TMcKeown . . . I just changed it back to the way I first thought about it (checking for the values in the variables). I was thinking that switching the order was merely cosmetic, but as you point out, clearly it is not. – Gordon Linoff Mar 05 '14 at 20:29
  • 1
    i think my way would work too... hint hint... there is no pretty way – T McKeown Mar 05 '14 at 20:36
  • Thank you very much... your alternative is cleaner. – Gabriel Mazetto Mar 05 '14 at 20:39
3

... I'm okay accepting answers that can run on PostgreSQL. (I want to learn if there is a smart way of writing this type of queries without "brute force")

There is a "smart way" in Postgres, with sorted arrays.

Integer

For integer values use sort_asc() of the additional module intarray.

SELECT * FROM tbl
WHERE  sort_asc(ARRAY[id1, id2, id3]) = '{1,2,3}' -- compare sorted arrays

Works for any number of elements.

Other types

As clarified in a comment, we are dealing with strings.
Create a variant of sort_asc() that works for any type that can be sorted:

CREATE OR REPLACE FUNCTION sort_asc(anyarray)
  RETURNS anyarray LANGUAGE sql IMMUTABLE AS
'SELECT array_agg(x ORDER BY x COLLATE "C") FROM unnest($1) AS x';

Not as fast as the sibling from intarray, but fast enough.

  • Make it IMMUTABLE to allow its use in indexes.
  • Use COLLATE "C" to ignore sorting rules of the current locale: faster, immutable.
  • To make the function work for any type that can be sorted, use a polymorphic parameter.

Query is the same:

SELECT * FROM tbl
WHERE  sort_asc(ARRAY[val1, val2, val3]) = '{bar,baz,foo}';

Or, if you are not sure about the sort order in "C" locale ...

SELECT * FROM tbl
WHERE  sort_asc(ARRAY[val1, val2, val3]) = sort_asc('{bar,baz,foo}'::text[]);

Index

For best read performance create a functional index (at some cost to write performance):

CREATE INDEX tbl_arr_idx ON tbl (sort_asc(ARRAY[val1, val2, val3]));

SQL Fiddle demonstrating all.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I was looking into the cube module if that could be of any help. But missed the sort functions in the `intarray` module. Very nice. –  Mar 06 '14 at 06:21
2

My answer assumes there is a Key column that we can single out. The output should be all the keys that meet all 3 values and each field and value being used:

This "should" get you a list of Keys that meet the criteria

SELECT F.KEY
FROM (
SELECT DISTINCT L.Key, L.POS
FROM (
  SELECT Key, 'A' AS POS, FieldA AS FIELD FROM table AS A
  UNION ALL
  SELECT Key, 'B' AS POS, FieldB AS FIELD FROM table AS A
  UNION ALL
  SELECT Key, 'C' AS POS, FieldC AS FIELD FROM table AS A ) AS L
WHERE L.FIELD IN(:VAL1, :VAL2, :VAL3)
) AS F
GROUP BY F.KEY
HAVING COUNT(*) = 3
T McKeown
  • 12,971
  • 1
  • 25
  • 32
1

Although Gordon's answer is definitely shorter and almost certainly faster as well, I was toying with the idea on how to minimize the code change when the number of combinations increase.

And I can come up with is something for Postgres which is by no means shorter, but more "change-friendly":

with recursive params (val) as (
   values (1),(2),(3) -- these are the input values
), all_combinations as (
   select array[val] as elements
   from params
   union all
   select ac.elements||p.val
   from params p
     join all_combinations ac 
       on array_length(ac.elements,1) < (select count(*) from params)
)
select *
from the_table
where array[id1,id2,id3] = any (select elements from all_combinations);

What does it do?

First we create a CTE holding the values we are looking for, the recursive CTE then builds a list of all possible permutations from those values. This list will include too many elements because it will also hold arrays with 1 or two elements.

The final select that puts the columns that should be compared into an array and compares that with the permutations generated by the CTE.

Here is a SQLFiddle example: http://sqlfiddle.com/#!15/43066/1

When the number of values (and columns) increase you only need to add the new value to the values row constructor and add the additional column to the array of columns in the where condition.

  • I was reading about recursive in PostgreSQL and wondering how to use it to do the job. thanks for sharing. – Gabriel Mazetto Mar 05 '14 at 20:43
  • @GabrielMazetto It's not only the CTE but also the efficiency of arrays in Postgres that really helps here. I would probably put the permutation generation (the two CTEs) into a function if that was something that is used very often. You then can simply write: `select * from the_table where array[id1,id2,id3] = any (select elements from generate_permutation(1,2,3)` –  Mar 05 '14 at 20:50
  • If you sort the array, you can reduce to a single equality check. – Erwin Brandstetter Mar 05 '14 at 23:05
1

Using a naive approach, I would use the in clause for this job, and since there should not be any repetition, exclude when the fields repeat.

There is also some optimisations you could do.

First you can exclude the last field, since:

A <> B, A <> C
A <> B, B <> C, 

Also means that:

C <> B,  C <> A

And also, the following queries doesn't need a previously queried field, since:

A <> B == B <> A

The query would be written as:

SELECT * FROM table
WHERE :val1 in (fieldA, fieldB, fieldC) and
  :val2 in (fieldA, fieldB, fieldC) and
  :val3 in (fieldA, fieldB, fieldC) and
  fieldA not in (fieldB, fieldC) and
  fieldB <> fieldC

This is a naive approach, there are probably others which use the MySQL API, but this one does the job.