Stack / Append the table column-wise in BigQuery

Question

I have a table in BigQuery which gets new data daily and appends it to the current table. This table is called : score_FULL. Additionally, I keep the individual scores on a daily basis, which can be accessed by score_20180125 if we choose today's date. Daily scores are appended to score_FULL on a daily basis.

score_FULL contains:

visitorID         score
#Older scores first
1                 0.15
2                 0.78
3                 0.12
6                 0.90
------------------------
2                 0.22
6                 0.65
7                 0.61
10                0.24
------------------------
1                 0.31
2                 0.41
10                0.12
-------------------------
#Newest scores appended

I would like to see user score changes. Each time a user gets a new score, we append it horizontally. Every-time we get a new user, we append it vertically. Assuming a user can only get 1 score per day, the ideal solution is:

visitorID     score1      score2       score3
1             0.15        0.31
2             0.78        0.22         0.41
3             0.12

E.g. a table that grows horizontally (new scores) AND vertically (new users)

I can do something similar using a sequence of Left Joins on individual tables, but this would only give me the visitor data from 1st table we run left joins from.

Note: I can add another column Date which will simply repeat the same date for all values, if it makes things easier.

No I don't but I do have the date from `score_DATE` table, which gets appended to the `FULL` table. Although, if there is an easy solution, I can add the date to the table too. So it will append the same date for all rows appended that day — GRS, Jan 25 '18 at 10:08
Have you ever worked with Window Functions or table wildcard queries? What's the maximum day span your data goes back? — Pentium10, Jan 25 '18 at 12:40
@Pentium10 No, I haven't. Only goes back 6 days at the moment, but this will grow daily. I could probably make it so that only last `n` days are reported. — GRS, Jan 25 '18 at 14:18
@Pentium10 What do you mean by partitioning? Each day has it's own table `score_DATE` and `score_FULL` is just a UNION ALL of all rows — GRS, Jan 30 '18 at 15:45
Instead of having a day and full table you can have on single Partitioned table. https://cloud.google.com/bigquery/docs/partitioned-tables — Pentium10, Jan 30 '18 at 18:29
I believe the term you're looking for is `pivot`. However this function doesn't seem to be supported in *bigquery* yet. But after a quick google it seems like there are ways around it. Just [look at this example](https://stackoverflow.com/questions/40761453/transpose-rows-into-columns-in-bigquery-pivot-implementation) — Oceans, Jan 31 '18 at 13:53
@GRS - Could you maybe add the current query to your question, the one you used to union the different tables into a single view? Because I don't really get the sample data you provided because I see *VisitorID 1* has 2 different values for day 1: *0.15* and *0.44* and in your desired result you simply ignore the second value. Is that just a typo? Just add the query for *score_FULL* — Oceans, Feb 02 '18 at 08:43
@Oceans My scripts appends daily data from `score_date` to the `score_FULL` so the query is just to SELECT * FROM `score_date` with the destination table set on append. The reason there are 2 scores for 1 user is to highlight that a single user can make multiple visits during the same day. Any value is okay to save a generally they are similar. — GRS, Feb 02 '18 at 10:00
@Oceans The more I think about it, the less likely I think I will arrive at the conclusion that I want with the current structure. — GRS, Feb 02 '18 at 10:09
@Oceans I think its easier to just keep all scores, so for user 1, the sequence is: `[0.15,0.44,0.31]`. This way I just need to create some sort of pivot e.g. for each visitor Id, list the scores horizontally. — GRS, Feb 02 '18 at 12:00
@GRS - Can you edit your question accordingly? Because you're making it really confusing. Just say what you actual want. Do you want a the columns to represent a day or just scores? Do you want it to be dynamic?(keep in mind you'll probably have to limit the amount of columns at one point. You'll have to decide how you want to limit this, last X amount of scores/days) Just start over, say what data you have available and the desired result, don't think about limitations just describe what you want first. — Oceans, Feb 02 '18 at 12:25

score 2 · Accepted Answer · answered Feb 06 '18 at 13:33

Instead of adding columns dynamically (which is quite a challenge here) - I would recommend aggregating respective visitor's scores in one column as either array or string of individual scores

Below is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.score_FULL` AS ( 
  SELECT 1 visitorID, 0.15 score UNION ALL
  SELECT 2, 0.78 UNION ALL
  SELECT 3, 0.12 UNION ALL
  SELECT 6, 0.90 UNION ALL
  SELECT 2, 0.22 UNION ALL
  SELECT 6, 0.65 UNION ALL
  SELECT 7, 0.61 UNION ALL
  SELECT 10, 0.24 UNION ALL
  SELECT 1, 0.31 UNION ALL
  SELECT 2, 0.41 UNION ALL
  SELECT 10, 0.12 
)
SELECT 
  visitorID,
  ARRAY_AGG(score) scores_as_array,
  STRING_AGG(CAST(score AS STRING)) scores_as_list
FROM `project.dataset.score_FULL`
GROUP BY visitorID

with output as

Row visitorID   scores_as_array scores_as_list   
1   1           0.15            0.15,0.31    
                0.31         
2   2           0.78            0.78,0.22,0.41   
                0.22         
                0.41         
3   3           0.12            0.12     
4   6           0.9             0.9,0.65     
                0.65         
5   7           0.61            0.61     
6   10          0.24            0.24,0.12    
                0.12

Stack / Append the table column-wise in BigQuery

1 Answers1