1

I try to write a query where I select rows based one a LIKE pattern that has to match none of the entries of an array. What I have is the current one

SELECT * FROM mytable WHERE 
NOT (EXISTS (SELECT * from unnest(a) WHERE  unnest LIKE '%XYZ%' ))) 

Actually what I want is pairs - so it is more like

SELECT * FROM mytable WHERE 
NOT (
  (EXISTS (select * from unnest(a) as A, unnest(b) as B  WHERE  A||B LIKE '%xyz%abc%' )) OR
  (EXISTS (select * from unnest(a) as A, unnest(b) as B  WHERE  A||B LIKE '%abc%xyz%' ))
)

This works for me but looks rather unwieldy. Is there prettier/more efficient solution to this? What I am find the most irksome is the SELECT as part of the WHERE clause to unroll arrays.

some points:

  • In my case xyz and abc will not appear together as substrings in a single array entry (e.g. nothing like xyzblaabcexists)
  • Both arrays have exactly the same number of elements (they are actually derived from another query)
  • no null elements (but even if that would still work as then I don't have a pair of abc and xyz)

Edit:

Clarification: a row contains A as {rxyz,foo, bar} and B as {other, abc, this} should not be returned as it contains the rxyz in one array and abc in the other

Example:

SELECT * FROM(
 SELECT *
    FROM   (select 1 as ID, '{rxyz,foo,bar}'::varchar[] as a, '{abc,other,this}'::varchar[] as b) row1  UNION
 SELECT *
    FROM   (select 2 as ID, '{rxyz,foo,bar}'::varchar[] as a, '{other,rabc,this}'::varchar[] as b) row1  UNION
 SELECT *
    FROM   (select 3 as ID, '{else,foo,bar}'::varchar[] as a, '{abc,other,this}'::varchar[] as b) row2

    ) mytable
WHERE   
NOT (
  (EXISTS (select * from unnest(a) as A, unnest(b) as B  WHERE  A||'-|-'||B LIKE '%xyz%-|-%abc%' OR B||'-|-'||A LIKE '%xyz%-|-%abc%' )))

returns only row3. In my use case I can guaranty that -|- is not part of either list making it possible to separate these.

Lutz
  • 612
  • 3
  • 8
  • *"This works ... " - or does it? See answer. – Erwin Brandstetter Feb 17 '20 at 13:48
  • So in [this fiddle](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=ea8f59ed2b97e07ee7933660ace36173), only 1-3 would qualify? – Erwin Brandstetter Feb 17 '20 at 14:04
  • @ErwinBrandstetter in that one, everything but 4 should be returned. – Lutz Feb 17 '20 at 14:12
  • Sorry, but that doesn't add up. Your query applied to the fiddle returns a different set: *db<>fiddle [here](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=bbf30602190d27ceb9ddd16cd66ee0b8)* Please provide an ***exact*** definition of what you want to achieve. – Erwin Brandstetter Feb 17 '20 at 14:25
  • @ErwinBrandstetter As mentioned `abc` and `xyz`, in my case, are not going to be part of the same string. In your example they are in 7 and 8 - I missed that sorry. So 4, 7 and 8 are filtered out "correctly". The arrays are actually result of an `array_agg()` on other data. So I know that they will always have the same size. – Lutz Feb 17 '20 at 14:33
  • Then your last query looks good, mostly. But *A)* the regexp pattern is broken (typo, I guess). And *B)* Still breaks with different number of elements (cannot apply, you say, but be sure to document the trap) or NULL elements like I explained in my answer. Please clean up your question with the exact properties of the setup and the exact definition of what you want. And trim the noise. :) I am out of time now. – Erwin Brandstetter Feb 17 '20 at 14:39
  • Actually you are right the last query is not what I want. So need to make it more complex – Lutz Feb 25 '20 at 15:21

2 Answers2

1

You can unnest both into a single derived table:

select *
from mytable
where not exists (select *
                  from unnest(a,b) as x(a,b)
                  where x.a like '%xyz%'
                     or x.b like '%xyz%'
                     or x.a like '%abc%'
                     or x.b like '%abc%')
1

The updated task description is:

Eliminate rows where any element of text array a contains one of ('abc', 'xyz') and any element of text array b contains the other.

Your second query achieves that, but at terrible cost. unnest(a) as A, unnest(b) combines every element of array a with every element of array b - a Cartesian product, O(N²), makes performance deteriorate quickly with longer arrays.

Try one of these instead:

Test against concatenated strings with CROSS JOIN:

SELECT t.*
FROM   tbl t, array_to_string(a, '|') AS a_string, array_to_string(b, '|') AS b_string
WHERE (a_string LIKE '%abc%' AND b_string LIKE '%xyz%'
   OR  a_string LIKE '%xyz%' AND b_string LIKE '%abc%') IS NOT TRUE;

Make sure to choose a separator that cannot produce false matches. '|' in the example.

Test against concatenated strings with NOT EXISTS:

SELECT *
FROM   tbl t
WHERE  NOT EXISTS (
   SELECT FROM array_to_string(a, '|') a_string
             , array_to_string(b, '|') b_string
   WHERE  a_string LIKE '%abc%' AND b_string LIKE '%xyz%'
      OR  a_string LIKE '%xyz%' AND b_string LIKE '%abc%'
   );

Test unnested elements with LATERAL subquery;

SELECT t.*
FROM   tbl t
JOIN   LATERAL (
   SELECT bool_or(a_elem LIKE '%abc%') AND bool_or(b_elem LIKE '%xyz%') 
       OR bool_or(a_elem LIKE '%xyz%') AND bool_or(b_elem LIKE '%abc%') AS exclude
   FROM   unnest(t.a,t.b) e(a_elem, b_elem)
   ) x ON exclude IS NOT TRUE;

Test unnested elements with NOT EXISTS

SELECT *
FROM   tbl t
WHERE  NOT EXISTS (
   SELECT FROM unnest(t.a,t.b) AS e(a_elem, b_elem)
   HAVING bool_or(a_elem LIKE '%abc%') AND bool_or(b_elem LIKE '%xyz%') 
       OR bool_or(a_elem LIKE '%xyz%') AND bool_or(b_elem LIKE '%abc%')
   );

db<>fiddle here

The last one is my favorite. NNot sure which one performs best. All of them should perform substantially faster than what you have now.

About unnest(a,b):

Some comments on your points

In my case xyz and abc will not appear together as substrings in a single array entry (e.g. nothing like xyzblaabcexists).

That's a big assumption. There may be good reasons for it, but are you sure there are no corner cases, not now, not ever? Else your code will break silently later and it will be extremely hard to determine the cause.

Both arrays have exactly the same number of elements (they are actually derived from another query)

The most efficient query would then typically be to apply the filter at the source, in that other query - if at all possible. Might even use an appropriate index. Like, a trigram index:

no null elements (but even if that would still work as then I don't have a pair of abc and xyz)

Fair enough. But be wary of NULL values doing what you want if there are some.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I added a bit of clarification to what I want - sorry – Lutz Feb 17 '20 at 13:56
  • @Lutz: I wrote a new answer to address that. – Erwin Brandstetter Feb 18 '20 at 00:04
  • Actually what I was hoping for was something to replace the whole `NOT EXISTS ( SELECT FROM unnest WHERE )` with something more compact and readable (like the inverse of `x like any (array of conditions)`). Guess I have to stick with that. As for applying at the source - does not work here. – Lutz Feb 25 '20 at 12:29
  • So somthing along the lines of `not ((any a like '%xyz%' and any b like '%abc%') or (any b like '%xyz%' and any a like '%abc%')) `But that all I could come up with was the `not exists in sub-querry` – Lutz Feb 25 '20 at 13:04
  • @Lutz: Please respond to my request in https://stackoverflow.com/questions/60262472/select-rows-where-elements-of-two-arrays-dont-match-a-pattern#comment106598321_60262472. And is the assumption at the top of my answer correct? It's hard to discuss details while the question isn't completely clear. – Erwin Brandstetter Feb 25 '20 at 13:22
  • I had an error in one of the queries that it only excluded if they where in the same place. Positions with in teach array are not important. It's about the pure existence of the elements in separate arrays. So exclude anything that has in a abc as a substring in one array and xyz in the other. having abc and xyz in one array but neither of the two in the second should not be filtered out. – Lutz Feb 25 '20 at 15:41
  • @Lutz: Douglas Adams tells a story about looking for an answer before getting the question right. Got them *42* and they had to start over ... – Erwin Brandstetter Feb 25 '20 at 23:57