0

I have read some StackOverflow which similar to my questions but I can't find the exact same of my problem. I have read: select max value of each group including other column and Select max value of each group and Select max value of each group

So here is my problem.

I have a table which looks like

+---------+---------------+-----------------------+
|column_1 |   column_2    |      column_3         | 
+---------+---------------+-----------------------+
|    A    |      200      | 1618558797853684118   |     
|    A    |      198.7    | 1618558797854783205   | 
|    A    |      201.3    | 1618558797855282263   |    
|    B    |      350.5    | 1618558775580928115   |  
|    B    |      349.9    | 1618558775581128138   |  
|    B    |      350.1    | 1618558775580856107   |
|    C    |      532      | 1618558797852667035   |
|    C    |      531      | 1618558775580345051   |
|    A    |      300      | 1618558797855492289   |
|    A    |      302      | 1618558797852512023   |   
|   ...   |  ........     |        ...            | 
+---------+---------------+-----------------------+

So as you can see the three of each row given each alphabet on column_1 almost have the same value, right? I need to get one of each of them, but only in the sequences. Let's take a look at the desired output for more clarity:

Desired output
+---------+---------------------------------------------------------------+-------------------------+
|column_1 |                         column_2                              |      column_3           | 
+---------+---------------------------------------------------------------+-------------------------+
|    A    | it can be (200 or 198.7 or 201.3) does not matter which one   | (depends on column_2)   |     
|    B    | it can be (350.5 or 349.9 or 350.1) does not matter which one | (depends on column_2)   | 
|    C    | it can be (532 or 531) does not matter which one              | (depends on column_2)   |    
|    A    | it can be (300 or 302) does not matter which one              | (depends on column_2)   |     
|   ...   |                        ........                               |          ...            | 
+---------+---------------------------------------------------------------+-----------------------+

So what I'm thinking is to group by each column and take the max or min value of column_3 (does not matter which one), but I failed to do that.

I'm sorry for the complex question, but can you help me? Thanks

ebuzz168
  • 1,134
  • 2
  • 17
  • 39

2 Answers2

1

Consider below

#standardSQL
with `project.dataset.table` as (
  select 1 id, 'A' column_1, 200 column_2, 1618558797853684118 column_3 union all
  select 2, 'A', 198.7, 1618558797854783205 union all
  select 3, 'A', 201.3, 1618558797855282263 union all
  select 4, 'B', 350.5, 1618558775580928115 union all
  select 5, 'B', 349.9, 1618558775581128138 union all
  select 6, 'B', 350.1, 1618558775580856107 union all
  select 7, 'C', 532, 1618558797852667035 union all
  select 8, 'C', 531, 1618558775580345051 union all
  select 9, 'A', 300, 1618558797855492289 union all
  select 10, 'A', 302, 1618558797852512023 union all
  select 12, 'C', 709, 1618558797852562325 union all
  select 13, 'C', 803, 1618558797851315651
)
select as value array_agg(struct(column_1, column_2, column_3) order by column_2 limit 1)[offset(0)]
from (
  select *, countif(flag) over(order by id) grp
  from (
    select *, column_1 != lag(column_1) over(order by id) flag
    from `project.dataset.table`
  )
) 
group by column_1, grp   

with output

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

You seem to have a form of gaps-and-islands problem. You want one row when adjacent rows have the same value for column_1.

I would suggest lag() (for the first row in each group) or lead() (for the last row):

select t.*
from (select t.*,
             lag(column_1) over (order by column_3) as prev_column_1
      from t
     ) t
where prev_column_1 is null or prev_column_1 <> column_1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hi, could you look at this http://sqlfiddle.com/#!17/9fd558/3 ? I tried your answer but it return A, B, A, C, C which it should return A,B,C,A,C on the first column – ebuzz168 Apr 16 '21 at 19:56
  • @ebuzz168 . . . The dates in the fiddle are all mixed up. It is. hard to see which rows should be returned. – Gordon Linoff Apr 16 '21 at 20:03