0

Does a relational database exist that has a GROUP BY aggregate function such as DISTINCT EXISTS that returns TRUE if there is more than one distinct value for the group and FALSE otherwise? I am looking for something that would iterate through the values in the group until the current value is not the same as the previous value, instead of counting ALL of the distinct values.

Example:
pv_name | time_stamp | value
A       | 1          | 1
B       | 2          | 1
C       | 3          | 1
A       | 4          | 2
C       | 5          | 2
B       | 6          | 3

SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
GROUP BY pv_name
HAVING DISTINCT_EXISTS(value);

Result: A, C
Patrick
  • 147
  • 1
  • 15
  • Some sample data and your expected results will help in understanding your question better. – Amit Sukralia Mar 24 '16 at 04:43
  • This question follows from this question: http://stackoverflow.com/questions/36191659/mysql-check-for-existence-of-value-changes-in-time-range – Patrick Mar 24 '16 at 04:57
  • I guess `HAS_CHANGES` or `MULTIPLE_DISTINCT_EXISTS` might be a better name. – Patrick Mar 24 '16 at 05:07
  • What database and which version are you using? Please add the corresponding tags to the question. Do you have a separate table with the list of all `pv_names`? It is possible to get the results you need without reading all rows of the `example` table. You need to read only two rows for each `pv_name`. – Vladimir Baranov Mar 24 '16 at 06:19
  • The example I gave is hugely simplified, just to show how the proposed aggregate function would work. – Patrick Mar 24 '16 at 19:24

3 Answers3

1
SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
GROUP BY pv_name
HAVING MIN(value)<>MAX(value);

Might get you there quicker depending on indexes. I don't think you'll do much better than this or COUNT(DISTINCT value) though.

Have you tried joining to example twice? Psuedo-code example:

with
(
    SELECT pv_name
    FROM example
    WHERE time_stamp > 0 AND time_stamp < 6
) as Q
select distinct Q1.pv_name
from Q as Q1 inner join Q as Q2 on
Q1.pv_name=Q2.pv_name and
Q1.value<>q2.value
LoztInSpace
  • 5,584
  • 1
  • 15
  • 27
  • What indexes would you use for the first query? `value` is a text column, so I can't index that. – Patrick Mar 24 '16 at 19:21
  • An index on Pv_name and value would probably help. If you can't index value then try an index on a hash instead. It's getting out of hand now – LoztInSpace Mar 24 '16 at 22:56
0

You probably know about the COUNT(DISTINCT) function and you want to avoid it to prevent unnecessary computations.

It is hard to know why you are looking for this but I assume that it takes long time to find these groups using the most obvious query:

SELECT type, COUNT(DISTINCT product)
FROM aTable
GROUP BY type
HAVING COUNT(DISTINCT product) > 1

I can recommend you try the window functions. Try for example the new T-SQL's LAST_VALUE and FIRST_VALUE functions:

with c as (
SELECT type
 ,LAST_VALUE(product) OVER (PARTITION BY type ORDER BY product) lv
 ,FIRST_VALUE(product) OVER (PARTITION BY type ORDER BY product) pv
FROM aTable
)
SELECT * from c where lv <> pv

If the DB engine is smart enough it will find the first/last value for the group and will not try to count all the values, and therefore perform better.

For MySQL you can use helper variables to get the row_number per group based on the distinct values, something like this:

SELECT type, product
FROM (
SELECT  @row_num := IF(@prev_type=type and @prev_prod=product,@row_num+1,1) AS RowNumber
       ,type
       ,product
       ,@prev_type := type
       ,@prev_prod := product
  FROM Person,
      (SELECT @row_num := 1) x,
      (SELECT @prev_type := '') y,
      (SELECT @prev_prod := '') z
  ORDER BY type, product
) as a
WHERE RowNumber > 1
cha
  • 10,301
  • 1
  • 18
  • 26
  • Yes, I'm trying to avoid counting all of the distinct values with COUNT(DISTINCT). There may be a LOT of rows in each group, most of which are distinct. Also, the distinct values are in a text column, so I'm guessing each comparison is expensive. – Patrick Mar 24 '16 at 04:54
  • I added an example for MySQL. I do not know if it is going to be faster then COUNT(DISTINCT) though – cha Mar 24 '16 at 05:08
  • I'm not very familiar with helper variables. So this query finds each type that has more than one distinct product? – Patrick Mar 24 '16 at 05:15
  • Yes, it should. I did not try it, as I do not have a schema. If you have a fiddle I can try it there – cha Mar 24 '16 at 05:19
  • The table I have is a bit more complicated and may have too much data for a fiddle. I need to leave, but I'll try to investigate further tomorrow. I may also look at MySQL user defined functions. Thank you for your help. – Patrick Mar 24 '16 at 05:31
  • I tried using the helper variables query, but it took a really long time. The frustrating thing is that I know exactly how I would do it in an imperative language like C. I looked at user defined functions, but it seems that the UDF would still be called for every value in the group, even if I only needed to calculate on a small set of them. Is there a way to somehow use a cursor with a group by to only iterate through the values in each group until a change is found and then move on the the next group? Or is every value in the group still going to be retrieved? – Patrick Mar 25 '16 at 18:21
0

I think the having min (value) <> max (value) will be most efficient here. An alternative is:

 Select distinct pv_name
 From example e
 Left join (
     Select value
     From example
     Where ...
     Group by value
     Having count (*) = 1
     ) s on e.value = s.value
 Where s.value is null

Or you could use NOT EXISTS against that subquery instead.

Include the relevant where clause in the sub query.

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51