0

I have the following nested data in two columns:

Categories_A    Categories_B
{"A"}           {"B","F","C"}
{"B","A"}       {"Z","B","F"}
{"B","F"}       {"A","E","R"}

I would like to return in a new column the missing category/categories in categories_B when compared to categories_A - what is in Categories_A that is not in categories_B. So ideally:

Categories_A    Categories_B      Missing_Category
{"A"}           {"B","F","C"}     {"A"}
{"B","A"}       {"Z","B","F"}     {"A"}
{"B","F"}       {"A","E","R"}     {"B","F"}

So far what I managed is to return if there is a complete match between the columns:

select Categories_A, Categories_B,
    case
        when Categories_A = Categories_B then 'TRUE'
        else 'FALSE'
    end is_a_match
from facts_themes
Categories_A    Categories_B      is_a_match
{"A"}           {"B","F","C"}     FALSE
{"B","A"}       {"Z","B","F"}     FALSE
{"B","F"}       {"A","E","R"}     FALSE
Joehat
  • 979
  • 1
  • 9
  • 36
  • 2
    Unrelated, but: your `CASE` expression can be simplified to `categories_a = categories_b as is_a_match` –  Jan 24 '22 at 11:15
  • Please clarify: Your version of Postgres. Exact table definition? Are all arrays 1-dimensional? What makes your arrays "nested"? Can there be duplicate array elements? (How to deal with those?) Can the column be NULL? Can there be NULL array elements? (How to deal with NULL?) Roughly how many rows, and how many elements per array? – Erwin Brandstetter Jan 24 '22 at 11:44

2 Answers2

1

There is no built-in function for that, but it's easy to write your own:

create function array_except(p_one anyarray, p_two anyarray)
  returns anyarray
as
$$
  select array_agg(e)
  from (
    select e
    from unnest(p_one) as p1(e)
    except
    select e
    from unnest(p_two) as p2(e)
  ) x
$$
language sql
immutable
;

Then you can use it like this:

select categories_a, categories_b, 
       array_except(categories_a, categories_b) as missing_categories
from facts_themes

Online example

  • Thanks for your answer. However, I'm getting a syntax error near create function. I have cte above it. A comma at the end of it didn't solve the error. The error: ERROR: syntax error at or near "create" Position: 1213 – Joehat Jan 24 '22 at 11:21
  • The `CREATE FUNCTION` is perfectly valid. Maybe you forgot to end the previous statement with `;` or something else. –  Jan 24 '22 at 11:23
  • @a_horse_with_no_name would you mind explaining what x is? Can't find it in other create function examples. – Joehat Jan 24 '22 at 12:01
  • 1
    @Joehat: It's remarkable how you ask for clarification while you are giving none. – Erwin Brandstetter Jan 24 '22 at 12:04
  • 1
    @Joehat: it's a [table alias](https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-TABLE-ALIASES) –  Jan 24 '22 at 12:04
0

While no duplicates can be involved, use the faster EXCEPT ALL:

SELECT *, ARRAY(SELECT unnest(categories_a) EXCEPT ALL
                SELECT unnest(categories_b)) AS missing
FROM   facts_themes;

db<>fiddle here

If dupes can be involved, you'll first have to define desired behavior. About EXCEPT ALL:

Also note that EXCEPT treats NULL as just another value (while most array operators do not).

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228