1

I recently moved from using apache hive to GCP big query as my SQL database.

I have a set of data as follows,

ID GROUP GROUP_TYPE OFFSET OFFSET_IDENTIFIER
1 100 A 120 1
2 100 A 140 2
3 500 B 120 1
4 500 B 140 2
5 5000 C 300 1
6 5000 C 250 2
7 10000 D 24 1
8 10000 D 46 2
9 10000 D 99 3
10 10000 D 11 4

In hive I used the following query to get the min of offset by group_type and returned the min offset and corresponding offset identifier,

select 
  ID,
  GROUP,
  GROUP_TYPE,
  min(struct(OFFSET,OFFSET_IDENTIFIER)).col1 as min_offset,
  min(struct(OFFSET,OFFSET_IDENTIFIER)).col2 as offset_identifier from eng.offsets 
group by ID,GROUP,GROUP_TYPE;

The output I get from this query is as follows,

ID GROUP GROUP_TYPE OFFSET OFFSET_IDENTIFIER
1 100 A 120 1
3 500 B 120 1
6 5000 C 250 2
10 10000 D 11 4

But when I execute the same query in BQ,

select 
  ID,
  GROUP,
  GROUP_TYPE,
  min(struct(OFFSET,OFFSET_IDENTIFIER)).OFFSET as min_offset,
  min(struct(OFFSET,OFFSET_IDENTIFIER)).OFFSET_IDENTIFIER as offset_identifier 
from eng.offsets 
  group by ID,GROUP,GROUP_TYPE;

I get the following error: MIN is not defined for arguments of type STRUCT

May I check if there is any solution to this issue/a way to get the min of struct?

Thank you for your inputs!

Ricco D
  • 6,873
  • 1
  • 8
  • 18

1 Answers1

0

Consider query below:

with sample_data as (
  select 1 as id, 100 as groupp, 'A' as group_type, 120 as offsett, 1 as offset_identifier
  union all select 2 as id, 100 as groupp, 'A' as group_type, 140 as offsett, 2 as offset_identifier
  union all select 3 as id, 500 as groupp, 'B' as group_type, 120 as offsett, 1 as offset_identifier
  union all select 4 as id, 500 as groupp, 'B' as group_type, 140 as offsett, 2 as offset_identifier
  union all select 5 as id, 5000 as groupp, 'C' as group_type, 300 as offsett, 1 as offset_identifier
  union all select 6 as id, 5000 as groupp, 'C' as group_type, 250 as offsett, 2 as offset_identifier
  union all select 7 as id, 10000 as groupp, 'D' as group_type, 24 as offsett, 1 as offset_identifier
  union all select 8 as id, 10000 as groupp, 'D' as group_type, 46 as offsett, 2 as offset_identifier
  union all select 9 as id, 10000 as groupp, 'D' as group_type, 99 as offsett, 3 as offset_identifier
  union all select 10 as id, 10000 as groupp, 'D' as group_type, 11 as offsett, 4 as offset_identifier

),
add_min_offset as(
select 
  id,
  groupp,
  group_type,
  offsett,
  min(offsett) over (partition by group_type) as min_offset,
  offset_identifier
from sample_data

)
select * except(offsett) from add_min_offset
where offsett=min_offset

Output:

enter image description here

Ricco D
  • 6,873
  • 1
  • 8
  • 18