0

Let's take the following query:

with main as (
    select "David" as name, ["Joan", "Rick"] as parents union all
    select "Kyle" as name, ["Cathy", "Rick"] as parents
)
select
    name, parent
from
    main 
    CROSS JOIN UNNEST(parents) as parent

enter image description here

My question is why the <table> CROSS JOIN UNNEST(field) only joins with the 'table-from-array' within that row, and not in the entire table. For example, why doesn't the query do this instead:

main
    `name`
    - David
    - Kyle

<derived_from_array>
   `parent`
    - Rick
    - Joan
    - Cathy
    - Rick

Or, put another way, why doesn't the end-query do this:

with main as (
    select "David" as name union all
    select "Kyle" as name
),
derived_from_array as (
    select "Rick" as parent union all
    select "Joan" as parent union all
    select "Cathy" as parent union all
    select "Rick" as parent
)
select
    name, parent
from
    main 
    CROSS JOIN derived_from_array

enter image description here

In other words, how does it know to build the table-from-array only within that row ?

David542
  • 104,438
  • 178
  • 489
  • 842

2 Answers2

2

The cross join unnest() is implementing a lateral join. Unfortunately, BigQuery does not make this explicit. However, if you add aliases, I think it becomes clearer:

select m.name, parent
from main m cross join
     unnest(m.parents) as parent;

So, for each row in main, the cross join is joining together a set. What set? The set that is generated by unnesting parents in that row. Hence, the data from different rows is not mixed together.

Using more standard syntax, this lateral keyword would make this more obvious:

select m.name, parent
from main m cross join lateral
     unnest(m.parents) as parent;

But as I say, BigQuery does not support this Standard SQL syntax.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • thanks, here's a good reference too for the `LATERAL`: https://stackoverflow.com/a/28557803/651174. – David542 Aug 25 '21 at 21:40
0

to make it even more friendly and better understood - you can use below version

with main as (
    select "David" as name, ["Joan", "Rick"] as parents union all
    select "Kyle" as name, ["Cathy", "Rick"] as parents
)
select
    name, parent
from
    main m
    CROSS JOIN m.parents as parent

As you can see parents array is treated here as a table within the respective row, so you join each of the rows in this sub-table with the respective row of parent table

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230