0

I have a table like as shown below

enter image description here

As shown, I have two rows for the same subject. each row indicating a day

However, I wish to convert them into a single row like as shown below

enter image description here

Can you help? I did check this post but unable to translate it?

The Great
  • 7,215
  • 7
  • 40
  • 128

1 Answers1

2

I did check this post but unable to translate it?

Let's first transform your original data into form that we then can pivot

Below does this:

#standardSQL
SELECT subject_id, hm_id, icu_id, balance, 
  DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table` 
-- ORDER BY subject_id, hm_id, icu_id, delta

If to apply to sample data from your question - result is

Row subject_id  hm_id   icu_id  balance delta    
1   124         ab      cd      2       1    
2   124         ab      cd      5       2    
3   321         xy      pq      -6      1    
4   321         xy      pq      1       2     

So, now we need to pivot this based on delta column - balance for delta = 1 will go to day_1_balance, balance for delta = 2 will go to day_2_balance and so on

Let's for now assume that there are just two deltas (as in your sample data). In this simplified case - below will make a trick

#standardSQL
SELECT subject_id, hm_id, icu_id,
  MAX(IF(delta = 1, balance, NULL)) day_1_balance,
  MAX(IF(delta = 2, balance, NULL)) day_2_balance  
FROM (
  SELECT subject_id, hm_id, icu_id, balance, 
    DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
  FROM `project.dataset.table` 
)
GROUP BY subject_id, hm_id, icu_id
-- ORDER BY subject_id, hm_id, icu_id

with result

Row subject_id  hm_id   icu_id  day_1_balance   day_2_balance    
1   124         ab      cd      2               5    
2   321         xy      pq      -6              1      

Obviously, in real case you don't know how many delta columns you have so you need to build above query dynamically - and that is exactly where post you referenced - will help you

You can try again by yourself - or see below for final solution

Step 1 - generating query

#standardSQL
WITH temp AS (
  SELECT subject_id, hm_id, icu_id, balance, 
    DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
  FROM `project.dataset.table` 
)
SELECT CONCAT('SELECT subject_id, hm_id, icu_id,', 
   STRING_AGG(
      CONCAT(' MAX(IF(delta = ',CAST(delta AS STRING),', balance, NULL)) as day_',CAST(delta AS STRING),'_balance')
   ) 
   ,' FROM temp GROUP BY subject_id, hm_id, icu_id ORDER BY subject_id, hm_id, icu_id')
FROM (
  SELECT delta 
  FROM temp
  GROUP BY delta
  ORDER BY delta
) 

Result of step 1 is the text that represent final query that you need to run as step 2

Step 2 - run generated query

#standardSQL
WITH temp AS (
  SELECT subject_id, hm_id, icu_id, balance, 
    DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
  FROM `project.dataset.table` 
)
SELECT subject_id, hm_id, icu_id, 
  MAX(IF(delta = 1, balance, NULL)) AS day_1_balance, 
  MAX(IF(delta = 2, balance, NULL)) AS day_2_balance 
FROM temp 
GROUP BY subject_id, hm_id, icu_id 
-- ORDER BY subject_id, hm_id, icu_id
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230