3

Currently I know how to use the BigQuery Python API to make queries from <company_warehouse>.<table> in the form

SELECT id, field1, field2, ...
FROM '<company_warehouse>.<table>'
WHERE field1==...

What I'd like to do is perform the above query but add in a restriction to search only on a subset of the total data restricted to a large set (list) of IDs I have in Python (as a list or pandas Series). i.e., the same as adding a clause WHERE id IN (id1,...,idn), but this method is inefficient for very large n.

This question has some SQL answers suggesting to do it with inner joins, but the answers aren't written with clearly usable code (even in other SQL DBMS's). How do I get this done in BigQuery?

2 Answers2

4

You can write:

    WHERE field1 = ... AND 
        id IN UNNEST(['id1', 'id2', 'id3', 'id4'])
-1

You can use an array:

SELECT id, field1, field2, ...
FROM '<company_warehouse>.<table>'
WHERE field1 = ... AND
      id in (select el from unnest(array[1, 2, 3]) el)

The array[1, 2, 3] can be passed in as a parameter.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786