12

I am trying to join two tables that each have an array column like the following

SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array

or

SELECT a.id, b.value
FROM a INNER JOIN b
ON UNNEST(a.array) IN UNNEST(b.array)

According to this SO question, postgres has operators like <@ and >@ that compares if either is a subset of the other array ( postgres doc page) but BigQuery only allows an element of the array to be compared with the other array like the following

a.arrayelement IN UNNEST(b.array)

Can it be done in BigQuery?

edit

This is the schema I am working with

WITH b AS (
    {  "ip": "192.168.1.1",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "peach",
          "value: "pink"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
    ,{  "ip": "192.168.1.2",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
   ),
WITH a AS (
    {  "id": "12345",
      "cookie": [
        { "key": "peach",
          "value: "pink"
        }
      ]
    }
    ,{  "id": "67890",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        },

      ]
     }
)

I am expecting an output like the following

ip, id
192.168.1.1, 67890 
192.168.1.2, 67890 
192.168.1.2, 12345

It is a continuation of the following SO, How do I find elements in an array in BigQuery . I tried using subqueries to compare a single element of one of the array, but BigQuery returns an error saying that I have "too many subqueries"

Community
  • 1
  • 1
dorachan2010
  • 981
  • 3
  • 12
  • 21

2 Answers2

13

Here is an alternative solution, which avoids running JOIN in correlated subquery, and instead relies on IN UNNEST() expression - this should give better performance:

#standardSQL
WITH a AS (
  SELECT 1 AS id, [2,4] AS a_arr UNION ALL
  SELECT 2, [3,5]
),
b AS (
  SELECT 11 AS value, [1,2,3,4] AS b_arr UNION ALL
  SELECT 12, [1,3,5,6]
)
SELECT a.id, b.value
FROM a , b
WHERE (SELECT LOGICAL_AND(a_i IN UNNEST(b.b_arr)) FROM UNNEST(a.a_arr) a_i)
Mosha Pasumansky
  • 13,206
  • 5
  • 32
  • 55
9

Try below example (BigQuery Standard SQL)

#standardSQL
WITH a AS (
  SELECT 1 AS id, [2,4] AS a_arr UNION ALL
  SELECT 2, [3,5]
),
b AS (
  SELECT 11 AS value, [1,2,3,4] AS b_arr UNION ALL
  SELECT 12, [1,3,5,6]
)
SELECT a.id, b.value
FROM a , b , UNNEST([(SELECT ARRAY_LENGTH(a.a_arr) - COUNT(1) 
                      FROM UNNEST(a.a_arr) AS x 
                      JOIN UNNEST(b.b_arr)  AS y 
                      ON x = y)]) AS z
WHERE z = 0

it mimics below pseudo-code:

SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array  

Let me know if you want me to apply this to your example - or you will try by yourself first :o)

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • how it goes? do you need me to jump in? – Mikhail Berlyant Apr 27 '17 at 21:25
  • I applied your logic to my actual schema and its working :D Now I am checking to see if it could be scaled to 1000+ rows in the WITH clause.. I hope BigQuery doesn't yell at me – dorachan2010 Apr 27 '17 at 22:15
  • great. glad we did it :o) – Mikhail Berlyant Apr 27 '17 at 22:16
  • I do have a question. What is the difference between surrounding the **z** table with _UNNEST_ and not surrounding it? I understand the logic but I wanted to dissect the SQL and see the actual behavior... but I need the _UNNEST_ to get the example code working – dorachan2010 Apr 27 '17 at 22:18
  • introducing z is a trick which allows you to pre-create column for each join combination such that you can use it directly in WHERE clause. Otherwise you would need to explicitly put respective subselect as a extra column in the output and than wrap it with another select where you will filter result by matches. the idea is to leave only those combinations where number of common elements equal to number of elements in array from table a. try to play with this more and let m eknow if more questions – Mikhail Berlyant Apr 27 '17 at 22:58