0

We have two tables like below:

Table A

 Name | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | art_and_theatre      
 Sue  | Interest  | finances_and_investments
 Sue  | Interest  | art_and_theatre
 Joe  | Interest  | cooking_and_nutrition
 Joe  | Interest  | nutrition_and_drinks
 Joe  | Interest  | eco_life
 Joe  | Interest  | beauty
 Bob  | Interest  | nutrition_and_drinks

Table B (Static)

           Interest                         |   Segment
--------------------------------------------+------------------
art_and_theatre                             |   S1
cooking_and_nutrition, nutrition_and_drinks |   S2 
finances_and_investments                    |   S3
finances_and_investments                    |   S4
technology                                  |   S5
telecommunications                          |   S6
art_and_theatre                             |   S7
art_and_theatre                             |   S8
eco_life, cooking_and_nutrition, beauty     |   S9

Expected table

 Name | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | art_and_theatre      
 Sue  | Interest  | finances_and_investments
 Sue  | Interest  | art_and_theatre
 Joe  | Interest  | cooking_and_nutrition
 Joe  | Interest  | nutrition_and_drinks 
 Bob  | Interest  | nutrition_and_drinks
          (+)
 Bob  | Segment   | S1
 Bob  | Segment   | S7
 Bob  | Segment   | S8
 Sue  | Segment   | S3
 Sue  | Segment   | S4
 Sue  | Segment   | S1
 Sue  | Segment   | S7
 Sue  | Segment   | S8
 Joe  | Segment   | S2
 Joe  | Segment   | S9

As you can see, one user can have multiple interests and multiple interests can belong to one segment. Is this kind of JOIN is possible in Big Query?

Note: The Interest column will have one or more values. Segments need to be joined only if all the values are matched.

  • Why are the interests stored as a comma-delimited string? Is this something to do with Google BigQuery or just a bad data model? – Thorsten Kettner Nov 02 '20 at 06:29
  • That's how the raw data is stored. Can we convert it into an array type and do the join? –  Nov 02 '20 at 06:30
  • I understand that this is your table. But this is an anti-pattern; why is the table designed like that? In a relational database management system I would expect a separate table. You may want to change your data model. (But I don't know Google BigQuery well enough. Maybe there is something special about it to suggest such pattern?) – Thorsten Kettner Nov 02 '20 at 06:35

3 Answers3

1

Below is for BigQuery Standard SQL

#standardSQL
select name, question, answer from `project.dataset.tableA`
union all
select distinct name, 'segment' as question, segment as answer
from (
  select answer, segment 
  from `project.dataset.tableB`, 
  unnest(split(interest, ', ')) answer
)
join `project.dataset.tableA`
using(answer)
-- order by question, name, answer    

if to apply to sample data from your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

Hmmm . . . I'm thinking union all with a join:

select a.name, a.question, a.answer
from a
union all
select a.name, 'Segment', b.segment
from a join
     b
     on a.answer = b.interest;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Yes this is possible, you should be able to do it with the following SQL

with temp as (
   SELECT a.*, b.*
   FROM TABLEA a
   JOIN TABLEB b
    on a.answer = b.interest
)
SELECT t.Name, t.Question, t.Answer from temp
UNION ALL
SELECT t.Name, 'Segment' as Question, t.Segment as Answer from temp
Daniel Zagales
  • 2,948
  • 2
  • 5
  • 18