1

My query is supposedly return the COLUMN NAME based on the maximum values of the columns. I tried using CASE WHEN to solve this case. but somehow this error occur, probably because I return the name of column instead of the value of the column itself:

No matching signature for operator CASE; all THEN/ELSE arguments must be coercible to a common type but found: INT64, STRING; actual argument types (WHEN THEN) ELSE: (BOOL STRING) (BOOL INT64) INT64 at [3:5]

My code is:

SELECT
ID,
    CASE
        WHEN col1 >= col2 AND col1 >= col3 AND col1 >= col4 AND col1 >= col5 THEN 'col1 '
        WHEN col2 >= col1 AND col2 >= col3 AND col2 >= col4 AND col2 >= col5 THEN 'col2 '
        ELSE 'col1'                                
    END AS Max_Column_Name
FROM table

Sample input is:

enter image description here

Is there any way to make this query more simple? So no need to repeat the when..case if there are lots of column. Other initiative is to use GREATEST(col1,col2,col3) like in this post but, Im not sure how to use it in standard sql, bigquery.

Expected output:

Return the name of column that has maximum value for each segment (country,product,language) and rename the name of column in the table result.

enter image description here

swm
  • 519
  • 1
  • 4
  • 20
  • are those segments predefined - country,product,language? and pattern for column name is _XXX ? pelase clarify Also is 6 columns a fixed number or can be any – Mikhail Berlyant Jun 23 '20 at 23:14
  • the pattern of name is exactly like the sample input mentioned. not necessary 6 column ,can be any , for example in my case there would be like more than 12 columns (e,g 4 columns for Country_XX, another 4 for Product_XX ..etc) – swm Jun 23 '20 at 23:33

3 Answers3

8

Assuming you have no NULL values, you can use greatest():

select (case greatest(country_uk, country_us)
            when country_uk then 'uk' when country_us then 'us'
        end), 
       . . .

You can also use arrays:

select (select el.what
        from unnest(array['uk' as what, country_uk as val), ('us', country_us)]) el
        order by el.val desc
        limit 1
       ) as country
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • @swm . . . What 'segment'? I don't know what you are referring to. – Gordon Linoff Jun 24 '20 at 01:25
  • does this query support if the there are multiples column with the same highest value. (e.g highest column names are UK and US) – swm Jun 24 '20 at 08:59
  • @swm . . . It does the same thing that the sample code in your question does. There is no other explanation in your question, so that would appear to be the right approach. – Gordon Linoff Jun 24 '20 at 10:39
2

Below is for BigQuery Standard SQL
It is quite generic up to some extend - if you have more or different segments and their names - you just need to reflect it in most outer SELECT - in the rows like (MAX(IF(segment = 'SegmentName', winner, NULL)) AS SegmentName), the rest is taken care of in inner query

#standardSQL
SELECT id,
  MAX(IF(segment = 'Country', winner, NULL)) AS Country,
  MAX(IF(segment = 'Product', winner, NULL)) AS Product,
  MAX(IF(segment = 'Lang', winner, NULL)) AS Lang
FROM (
  SELECT id, segment, ARRAY_AGG(item ORDER BY col_value DESC LIMIT 1)[OFFSET(0)] winner
  FROM `project.dataset.table` t, 
  UNNEST(ARRAY(
      SELECT AS STRUCT segment, item, SAFE_CAST(col_value AS INT64) AS col_value
      FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"\w+":[^,}]+')) kv,
      UNNEST([STRUCT(SPLIT(kv,':')[OFFSET(0)] AS col_name, SPLIT(kv,':')[OFFSET(1)] AS col_value)]) nv,
      UNNEST([STRUCT(SPLIT(TRIM(col_name, '"'), '_')[OFFSET(0)] AS segment, SPLIT(TRIM(col_name, '"'), '_')[OFFSET(1)] AS item)])
      WHERE TRIM(col_name, '"') != 'id'
    )) kv
  GROUP BY id, segment
)
GROUP BY id    

If to apply to sample data like in your question - output is

Row id  Country Product Lang     
1   abc US      A       EN   
2   def UK      B       PH      

Note: recently introduced EXECUTE IMMEDIATE will allow you to further generalize above solution, so you will not even need to worry about lines in most outer SELECT - I am leaving this to you as it is definitelly out of main question scope

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • is the 'segment' in outer SELECT is predefined or from the table? – swm Jun 24 '20 at 01:26
  • I used whatever I saw in your example. if in your real case you have different - just use them - replace `MAX(IF(segment = 'SegmentName', winner, NULL)) AS SegmentName` with your real segment names. you will end up with as many such lines in your outer select as many different segments you have . so in your example from question - you had three segment - thus you got three lines - hope it makes sense. sounds simple to me and does exactly what you asked :o) – Mikhail Berlyant Jun 24 '20 at 01:27
0
SELECT TOP 1 T.Col
FROM 
(
   SELECT Country_UK AS colMax,'Country_UK' Col FROM table 
    UNION ALL              
   SELECT Country_US AS colMax,'Country_US' Col FROM table 
    UNION ALL              
   SELECT Product_A AS colMax,'Product_A' Col FROM table 
    UNION ALL             
   SELECT Product_B AS colMax,'Product_B' Col FROM table 
    UNION ALL             
   SELECT Lang_EN AS colMax,'Lang_EN' Col FROM table 
) AS T 
GROUP BY T.Col 
Order by MAX(T.colMax) DESC
Vishal
  • 1