2

I have a table with the following columns:

items ARRAY<STRUCT<label STRING, counter INTEGER>>
explore BOOLEAN

For each record I would like to choose the label with the highest counter, and then count explore on each unique label. Ideally I would like to run something like:

SELECT FIRST_VALUE(items.label) OVER (ORDER BY items.counter DESC) as label,
       COUNT(explore) as explore
FROM my_table
GROUP BY 1

If this is the data in my table:

explore       items
   1      [['A',1],['B',3]]
   1      [['B',1]]
   0.     [['C',2],['D',1]]

Then I would like to get:

label  explore
 'B'      2
 'C'      1
Iza Pziza
  • 77
  • 5
  • 1
    Can you provide a complete sample data and expected results? Just so the community can easily reproduce the issue. – Ricco D Mar 16 '22 at 06:14

2 Answers2

1

Consider below approach

select ( select label from t.items
    order by counter desc limit 1
  ) label, 
  count(*) explore
from your_table t
group by label           

if applied to sample data in your question

with your_table as (
    select 1 explore, [struct('A' as label, 1 as counter), struct('B' as label, 3 as counter) ] items union all 
    select 1, [struct('B', 1)] union all 
    select 0, [struct('C', 2), struct('D', 1) ] 
)

output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

Using your sample data, consider below approach.

with data as (
    select 1 as explore, [STRUCT( 'A' as label, 1 as counter), STRUCT( 'B' as label, 3 as counter) ] as items,
    union all select 1 as explore, [STRUCT( 'B' as label, 1 as counter)] as items,
    union all select 0 as explore, [STRUCT( 'C' as label, 2 as counter), STRUCT( 'D' as label, 1 as counter) ] as items
),

add_row_num as (
SELECT 
        explore,
        items,
        row_number() over (order by explore desc) as row_number
FROM data
),

get_highest_label as (
select 
    explore,
    row_number,
    label,
    counter,
    first_value(label) over (partition by row_number order by counter desc) as highest_label_per_row 
from add_row_num, unnest(items)
),

-- https://stackoverflow.com/questions/36675521/delete-duplicate-rows-from-a-bigquery-table (REMOVE DUPLICATE)
remove_dups as (
  SELECT
      *,
      ROW_NUMBER()
          OVER (PARTITION BY row_number) as new_row_number
  FROM get_highest_label
)

select 
    highest_label_per_row,
    count(highest_label_per_row) as explore,
from remove_dups 

where new_row_number = 1
group by highest_label_per_row


Output:

enter image description here

Ricco D
  • 6,873
  • 1
  • 8
  • 18