I have a table like as shown below
As shown, I have two rows for the same subject. each row indicating a day
However, I wish to convert them into a single row like as shown below
Can you help? I did check this post but unable to translate it?
I did check this post but unable to translate it?
Let's first transform your original data into form that we then can pivot
Below does this:
#standardSQL
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
-- ORDER BY subject_id, hm_id, icu_id, delta
If to apply to sample data from your question - result is
Row subject_id hm_id icu_id balance delta
1 124 ab cd 2 1
2 124 ab cd 5 2
3 321 xy pq -6 1
4 321 xy pq 1 2
So, now we need to pivot this based on delta column - balance for delta = 1 will go to day_1_balance, balance for delta = 2 will go to day_2_balance and so on
Let's for now assume that there are just two deltas (as in your sample data). In this simplified case - below will make a trick
#standardSQL
SELECT subject_id, hm_id, icu_id,
MAX(IF(delta = 1, balance, NULL)) day_1_balance,
MAX(IF(delta = 2, balance, NULL)) day_2_balance
FROM (
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
)
GROUP BY subject_id, hm_id, icu_id
-- ORDER BY subject_id, hm_id, icu_id
with result
Row subject_id hm_id icu_id day_1_balance day_2_balance
1 124 ab cd 2 5
2 321 xy pq -6 1
Obviously, in real case you don't know how many delta columns you have so you need to build above query dynamically - and that is exactly where post you referenced - will help you
You can try again by yourself - or see below for final solution
Step 1 - generating query
#standardSQL
WITH temp AS (
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
)
SELECT CONCAT('SELECT subject_id, hm_id, icu_id,',
STRING_AGG(
CONCAT(' MAX(IF(delta = ',CAST(delta AS STRING),', balance, NULL)) as day_',CAST(delta AS STRING),'_balance')
)
,' FROM temp GROUP BY subject_id, hm_id, icu_id ORDER BY subject_id, hm_id, icu_id')
FROM (
SELECT delta
FROM temp
GROUP BY delta
ORDER BY delta
)
Result of step 1 is the text that represent final query that you need to run as step 2
Step 2 - run generated query
#standardSQL
WITH temp AS (
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
)
SELECT subject_id, hm_id, icu_id,
MAX(IF(delta = 1, balance, NULL)) AS day_1_balance,
MAX(IF(delta = 2, balance, NULL)) AS day_2_balance
FROM temp
GROUP BY subject_id, hm_id, icu_id
-- ORDER BY subject_id, hm_id, icu_id