BigQuery - JOIN on two tables using string and array

Question

We have two tables in BigQuery like below:

Table A

 Name | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | a
 Bob  | Interest  | b
 Sue  | Interest  | a
 Sue  | Interest  | c
 Joe  | Interest  | a
 Joe  | Interest  | b
 Joe  | Interest  | c
 Joe  | Interest  | d

Table B (Static)

           Interests                        |   Segment
--------------------------------------------+------------------
["a"]                                       |   S1
["a","b"]                                   |   S2 
["a", "b", "c", "d"]                        |   S3

Expected table

 User | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | a
 Bob  | Interest  | b
 Sue  | Interest  | a
 Sue  | Interest  | c
 Joe  | Interest  | a
 Joe  | Interest  | b
 Joe  | Interest  | c
 Joe  | Interest  | d
          (+)
 Bob  | Segment   | S1
 Bob  | Segment   | S2
 Sue  | Segment   | S1
 Joe  | Segment   | S1
 Joe  | Segment   | S2
 Joe  | Segment   | S3

In the above tables, the Answer field is of string type and Interests is of array type.

Pointers:

One user can have one or more interests.
One or more interests can belong to one segment.
A user will be assigned to a segment only when all of his interests are matched.

Any inputs/thoughts in the right direction would be greatly appreciated.

please take a look at https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad — nbk, Nov 02 '20 at 10:13

score 1 · Accepted Answer · answered Nov 02 '20 at 18:48

1

Below is for BigQuery Standard SQL

#standardSQL
select name, question, answer from `project.dataset.tableA`
union all
select name, 'Segment', segment
from (
  select 
    name, 'Segment', segment,
    ( select countif(y is null)
      from b.interest x
      left join a.answers y
      on x = y
    ) = 0 qualified
  from (
    select name, array_agg(answer) answers
    from `project.dataset.tableA`
    group by name
  ) a, `project.dataset.tableB` b
)
where qualified

if to apply to sample data from your question - output is

answered Nov 02 '20 at 18:48

Mikhail Berlyant

165,386
8
154
230

That worked like a charm! However, I didn't quite understand the logic here. A short description would be of great help. – Nov 03 '20 at 03:11
sure. will add comments tomorrow :o) – Mikhail Berlyant Nov 03 '20 at 05:37
brief explanation: 1) most inner select - simply generates array of answers for each user; 2) then result is cross joined with segments (tableB) and `qualified` attribute is calculated (see details below in #4); 3) finally output is filtered to only those which qualified and union with tableA; 4) logic for `qualified` is to check rule #3 in your question by counting how many user's interests are above what is in segments - if it is zero - it is qualified. Hope this helps to understand better the solution in my answer – Mikhail Berlyant Nov 03 '20 at 17:57

Gordon Linoff · Answer 2 · 2020-11-02T14:03:20.077

0

This looks like a union all -- where the second query unnests the interests and joins:

select a.name, a.question, a.answer
from a
union all
select a.name, 'segment', min(b.segment)
from a join
     (b cross join
      unnest(interests) b_interest
     )
     on a.interest = b_interest
group by name
having min(b.segment) = max(b.segment);

You might need a left join, if some interests don't have segments.

edited Nov 02 '20 at 14:03

answered Nov 02 '20 at 11:46

Gordon Linoff

1,242,037
58
646
786

Would it also fulfill point #3? Only if all the values are in the array, it should get matched. – Nov 02 '20 at 13:36
Getting this error `SELECT list expression references b.segment which is neither grouped nor aggregated at [11:29]` while executing the above query – Nov 02 '20 at 13:51
Any thoughts? Looks like the segment field has to be an aggregatable value. How that can be achieved? – Nov 02 '20 at 14:14
@Gopal . . . I fixed the answer a while ago. – Gordon Linoff Nov 02 '20 at 14:15
Tried this.. but not getting any results. Just the initial select statement is giving the results. – Nov 02 '20 at 14:18

BigQuery - JOIN on two tables using string and array

2 Answers2

Linked