0

I'm trying to do a two part query and return a text array of semicolon delimited column values.

First, select only rows that are unique based on values of three columns (i.e. if the three tuple of values exists more than once, it is not unique). Of the remaining rows, perform another filer based on a fourth column.

This is how I think about it, but maybe there's a better solution.

I've tried this a few different ways. My current attempt is using CTE:

with uniqe as (
    select distinct on (
        col1,
        col2,
        col3
    ) *
    from MyTable
)
select concat(col::text, ';', col2::text, ';', col3)
    as key
    from uniqe
    where upper(dateRange) <= (now() - interval '1 days')
    order by key;

The issue I'm running into is that SELECT DISTINCT ON (col1, col2, col3) ... seems to pick at least 1 row of the others that I'm not considering "unique".

Just to be clear, here's a sample table:

 id  | col1 | col2 | col3 |                       dateRange
-----+------+------+------+-------------------------------------------------------
  1  |   1  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  2  |   1  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  3  |   1  |   1  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  4  |   1  |   2  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  5  |   2  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  6  |   2  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  7  |   1  |   2  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  8  |   1  |   2  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")

I consider rows 3 & 4 to be the only unique rows.

maksadbek
  • 1,508
  • 2
  • 15
  • 28
theG
  • 1,001
  • 1
  • 8
  • 9

1 Answers1

1

Essentially, the problem boils down to selecting rows based on the value of a aggregate or window function. Thus the solutions here are applicable, except than in our case we want count(*) to equal 1.

Thus, we could use the WHERE IN method:

WITH uniqe AS (
    SELECT *
    FROM MyTable
    WHERE (col1, col2, col3) IN (
        SELECT col1, col2, col3
        FROM MyTable
        GROUP BY col1, col2, col3
        HAVING count(*) = 1
    ) AS t
)    

or the PARTITION BY method:

WITH uniqe AS (
    SELECT *
    FROM (
        SELECT col1, col2, col3, dateRange
            , count(*) OVER (PARTITION BY col1, col2, col3) AS cnt
        FROM MyTable
    ) AS t
    WHERE cnt = 1
)    

As Andomar explains, PARTITION BY is similar to GROUP BY in the way it affects how the window function's result is calculatiod, but unlike GROUP BY, it does not affect the number of rows returned.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    I had to add the `dateRange` col to your select statement, but that is it. If you update the answer, I'll make it as the selected answer. Thanks so much. (I'll have to read up on table partitioning since I'm not 100% sure what is going on here) – theG Feb 12 '19 at 04:19