24

I am trying to search for a row that has certain key value pairs in an array. A row in my BigQuery table would look something like this.

{
  "ip": "192.168.1.200",
  "cookie" [
    {
      "key": "apple",
      "value": "red"
    },
    {
      "key": "orange",
      "value": "orange"
    },
    {
      "key": "grape",
      "value": "purple"
    }
  ]
}

I thought about using implicit UNNEST or CROSS JOIN like the following, but it didn't work because unnesting it would just create multiple different rows.

SELECT ip
FROM table t, t.cookie c
WHERE (c.key = "grape" AND c.value ="purple") AND (c.key = "orange" AND c.value ="orange")

This link is really close to what I want to do, except they are using legacy SQL and not standardSQL

Swazimodo
  • 1,147
  • 1
  • 15
  • 34
dorachan2010
  • 981
  • 3
  • 12
  • 21

2 Answers2

20
#standardSQL
SELECT ip
FROM yourTable 
WHERE (
  SELECT COUNT(1) 
  FROM UNNEST(cookie) AS pair 
  WHERE pair IN (('grape', 'purple'),  ('orange', 'orange'))
) >= 2

you can test it with below dummy data

#standardSQL
WITH yourTable AS (
  SELECT '192.168.1.1' AS ip, [('apple', 'red'), ('orange', 'orange'), ('grape', 'purple')] AS cookie UNION ALL
  SELECT '192.168.1.2', [('abc', 'xyz')]
)
SELECT ip
FROM yourTable 
WHERE (
  SELECT COUNT(1) 
  FROM UNNEST(cookie) AS pair 
  WHERE pair IN (('grape', 'purple'),  ('orange', 'orange'))
) >= 2

In case if you need output ip if at least one pair is in array - you need to change >= 2 to >=1 in WHERE clause

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Worked like a charm, thanks. I didn't know you could write subqueries in WHERE conditions as well – dorachan2010 Mar 24 '17 at 20:16
  • @dorachan2010 - yes, but please note - `Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN` you can check http://stackoverflow.com/a/43006968/5221944 to see how correlated query was transformed into join to avoid error message – Mikhail Berlyant Mar 24 '17 at 20:18
  • Yeah, that's something I am still trying to understand for SQL and BigQuery SQL...the scope of the references in each of the statement.. – dorachan2010 Mar 24 '17 at 20:21
  • @dorachan2010 - sure. in the post I referenced above - you can see resolved `final` solution - but you can click on `edited NN hours ago` link to see my initial answer that had exactly that problem with correlated subquery. this can be a good example for you. Vote it up too - if it help you to learn/understand this subject better – Mikhail Berlyant Mar 24 '17 at 20:26
8

Mikhail's solution is good if it is guaranteed that there are no duplicate pairs in the cookie array. But if there could be duplicates, here is the alternative solution:

#standardSQL
WITH yourTable AS (
  SELECT 
    '192.168.1.1' AS ip,
    [('apple', 'red'), ('orange', 'orange'), ('grape', 'purple')] AS cookie UNION ALL
  SELECT
    '192.168.1.2',
    [('abc', 'xyz'), ('orange', 'orange'), ('orange', 'orange')]
)
SELECT ip
FROM yourTable t
WHERE (
  ('grape', 'purple')  IN UNNEST(t.cookie) AND
  ('orange', 'orange') IN UNNEST(t.cookie) )

Results in only

ip
-----------
192.168.1.1
Mosha Pasumansky
  • 13,206
  • 5
  • 32
  • 55